Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from...
-
Upload
kathryn-parsons -
Category
Documents
-
view
217 -
download
0
Transcript of Systems Biology Visualization Surabhi Agarwal There has been a rapid accumulation of data from...
Systems Biology Visualization
Surabhi Agarwal
There has been a rapid accumulation of data from protein interaction, gene expression and metabolic pathway analysis. To derive meaningful information
out of this data, we need to develop integrative visualization techniques, which provide an insight
into its biological relevance.
Definition of the Problem
Audio Narration
1
5
34
2
Action Description of the action Static
ImageDsiplay image and read narration We will consider the case study of the disease condition known as Glioma
which is a group of brain tumors. In the first part of the animation, we take an insight into the regulation of genes in Glioma by gene expression data analysis . It will give us an insight into the genes, which are modulated (up- or down-regulated) during Glioma. In the second part of the study, we will find the metabolic pathways that are involved in Glioma by undertaking a study with the protein Interaction data . In the third part, we will explore pathway databases and its features to study the pathways that were retrieved from the gene and protein interaction studies.
We will consider the case study of the disease condition known as Glioma which is a group of brain tumors. In the first part of the animation, we take an insight into the regulation of genes in Glioma by gene expression data analysis . It will give us an insight into the genes, which are modulated (up- or down-regulated) during Glioma. In the second part of the study, we will find the metabolic pathways that are involved in Glioma by undertaking a study with the protein Interaction data . In the third part, we will explore pathway databases and its features to study the pathways that were retrieved from the gene and protein interaction studies.
Master Layout (Part 1)
5
3
2
4
1 This animation consists of 3 parts:Part 1: Gene Expression Data AnalysisPart 2 : Protein Interaction Data AnalysisPart 3: Metabolic Profile Databases
http://www.genome.jp/kegg/
Chose the problem to study and extract relevant data
Send the gene expression profile data as input to the tool
Compute the features related to gene regulation
Genes up- or down-regulation
Definitions of the components:Part 1 – Gene expression data analysis
5
3
2
4
11. Interaction Data: Interaction data refers to information regarding the
nature and type of bonding between various biological components. It can be Protein Interaction Data, Gene Expression Data and Metabolic Pathway Data.
2. Visualization tools: Software tools that are capable of reading interaction data and then representing it in a graphical format thereby providing a simplistic biological insight. E.g. Cytoscape for Protein Interaction data, Genespring for Gene Expression Data.
3. Microarray: Microarrays are printed on a solid surface, typically glass, and used to study and analyze large number of samples simultaneously in high-throughput.
Gene Expression Profile Data–Option
Audio Narration
1
5
34
2
Action Description of the action
INPUT OUTPUT
Option for user to view Input Or Output
The Data generation box should be linked to step 1. Input box should be linked to the step 2 input slides. Same goes for output. Output slides should be linked to step 3. Visulaization slide should be linked to Step 4.This SLIDE is to provide the user an option to go through only specific content from the animation
To view the protocol for submitting files, click on input. To view the protocol for retrieving and analyzing output files, click on output. To proceed to full animation click on the arrow.
Proceed to Full Animation
DATA GENERATION
VISUALIZATION
Step 1.a - Gene Expression Profile Data – Data Extraction from Experiments
Audio Narration
1
5
34
2
Action Description of the actionUsers can extract gene microarray data from Microarray Experiments. The normalized microarray data gives an insight into the regulation of the genes. This regulation is checked by studying the microarray data through Gene Expression Profile Data Analysis software. For a detailed insight into the Microarray Technique, study the OSCAR animation for Microarray Technologies.
Schematic for extracting the data for defined problem
Follow the animation. Re-draw the figures.
Biological Samples e.g. gliomas
Microarray Chips
Scanned Slides
Biochemistry by A.L.Lehninger et al., 3rd edition
Step 1.b - Gene Expression Profile Data – Data Extraction from Databases
Audio Narration
1
5
34
2
Action Description of the actionUsers can extract microarray data directly from experiments or from Public repositories such as GEO datasets from NCBI. Premier microarray research institutes have their own dedicated databases for the microarray data that has been extracted in their labs. This data is in the form of compressed files due to their large file sizes. These files need to be stored in a local Personal Computer System. Here, as an example, we’ll study the regulation of genes in brain tumor, known as Glioma. Gene expression data analysis will give us a picture of the genes, which are modulated (up- or down-regulated) during Glioma.
Microarray Data Repository
Query Term High-Grade glioma
PMID ACCESSION NUMBER
PROTEIN NAME
GLIOMA TYPE
VALIDATION FOLD CHANGE
p-VALUE
Schematic for extracting the data for defined problem
Follow the animation and show storage of files in Local System
Microarray Data file
BInput - Extracting microarray data
For analysis
Step 2: Gene Expression Profile Data - Input
Action Audio Narration
1
5
34
Description of the action
2
http://www.genome.jp/kegg/
ADD PROJECT
ADD EXPERIMENT
UPLOAD DATA
SELECT PLATFORM
Name of the Project
Select Experimental Type
Agilent Single ColorAgilent Two ColorAffymentrix Copy NumberAffymentrix ExpressionIllumina Association AnalysisIllumina Copy NumberIllumina Single ColorRealTime - PCR
Affymentrix Expression
Select Technology (if applicable)
BarleyBovineE.ColiBSubtilisDrosophilaHumanMouseMaizeHuman
Human
Browse File Folder A/GSE123/GSM456.CEL
Glioma
The software follows the input procedure in a sequential manner. Initial steps are to add a new project and experiment. While adding experiment, user needs to define the type of experiment. Due to lack of standardization, microarray data is saved in various file formats such as CEL, GPR, GAL, CDT. Various tools support one or more of such formats.
Schematic for entering data and setting parameters
Follow the animation and re-draw images to replicate the working of a software environment
The technology used in Microarray Experiments refers to the reference organism used for
making the microarray chip
Step 3.a - Gene Expression Profile Data - Output
Action Audio Narration
1
5
34
Description of the action
2
http://www.genome.jp/kegg/
>=8Fold Change Cutoff
Schematic for interpreting the results of Gene Expression Data Analysis
Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
Summary Statistics
Filter data - Fold Change
Functional Analysis - GO
Heat Map
High cutoff to give significant results.
High cutoff is provided to give significant results. During comparison, probe sets that satisfy the fold change cutoff of more than 8 in at least one condition pair will be displayed in the result. Regulation is reported by comparing ratio of conditions 1 and 2. Thus, highlighted gene HMGCS1 is up-regulated in sample GSM34580 as compared to GSM 34586.
Step 3.b - Gene Expression Profile Data - Output
Audio Narration
1
5
34
2
http://www.genome.jp/kegg/
Action Description of the actionSchematic for interpreting the results of Gene Expression Data Analysis
Animator needs to re-draw all screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof., in the final animation. Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
Filter data - Fold Change
Functional Analysis - GO
Summary Statistics
Heat Map
Heat Map is the graphical visualization of the regulation of genes, which is determined by the cut-off value of fold change provided by the user. The up-regulation of the gene is marked in “red” while the down-regulation is marked by “blue” color as explained in the figure legend.
upregulated
downregulated
Legend for color coding of regulation
Step 3.c - Gene Expression Profile Data - Output
Action Audio Narration
1
5
34
Description of the action
2
http://www.genome.jp/kegg/
Schematic for interpreting the results of Gene Expression Data Analysis
Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
Filter data - Fold Change
Functional Analysis - GO
Summary Statistics
Heat Map
The summary statistics result gives the statistical gist of the genes screened after specifying a cut-off to the gene expression analysis server. This includes the number of genes observed to be regulated and the statistical significance of the fold change corresponding to it.
Step 3.d - Gene Expression Profile Data - Results
Audio Narration
1
5
34
2
http://www.genome.jp/kegg/
Action Description of the actionSchematic for interpreting the results of Gene Expression Data Analysis
Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate colors
Molecular Functions 1. catalytic activity2. hydroxymethylglutaryl-CoA synthase
activity3. cytokine activity4. protein binding5. chemokine activity6. G-protein-coupled receptor binding7. signal transducer activity
Biological Functions1. lipid metabolic process 2. fatty acid metabolic process3. positive regulation of endothelial cell
proliferation 4. angiogenesis5. apoptosis6. cell adhesion 7. response to hypoxia
Cellular Components affected1.endoplasmic reticulum2.extracellular region3.soluble fraction4.cytoplasm5.membrane fraction
Filter data - Fold Change
Functional Analysis - GO
Summary Statistics
Heat Map
The Functional Analysis tools gives the functions that the regulated genes are involved in at the molecular level, biological level and the cellular components they modulate.
Step 4. - Gene Expression Profile Data - Visualization
1
5
34
2
http://www.ingenuity.com/
Step 3.d - Gene Expression Profile Data - Visualization
Audio Narration1
5
34
2
http://www.ingenuity.com/
Action Description of the action
Static Slide Animator needs to re-draw all screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof, in the final animation. Show the image with audio narration. Show the zooming effect a shown in the animation.
The pathway information relevant in Gliomas Studies, from the input data, can be extracted. In this we show the merged gene regulatory pathway. We zoom into the pathway titled “Cell Cycle, Cellular Assembly and Organization, DNA Replication, Recombination, and Repair” and see the interactions of TP53 pathway.
Master Layout (Part 2)
5
3
2
4
1This animation consists of 3 parts:Part 1: Gene Expression Data AnalysisPart 2 : Protein Interaction Data AnalysisPart 3: Metabolic Profile Databases
Retrieve protein interaction data from experiments or public repositories or experiments
Input the data in the software tool in the right format
View, download and interpret the results
http://www.genome.jp/kegg/
Definitions of the components:Part 2 – Protein Interaction Data Analysis
5
3
2
4
11. Knowledgebase: The Protein Interaction Network tools accept the
user data and map it to its repository. These storage units of the tools are called their knowledgebase.
2. Accession Number: The accession number of a protein refers to the unique identifier, which acts as a common link to relate the data provided as input by the users with the knowledgebase of the tool.
3. Protein microarray: These are miniaturized arrays, commonly printed on glass, polyacrylamide gel pads or microwells, onto which small quantities of thousands of proteins can be simultaneously immobilized for high-throughput assaying.
Gene Expression Profile Data–Option
Audio Narration
1
5
34
2
Action Description of the action
INPUT OUTPUT
Option for user to view Input Or Output
The Data generation box should be linked to step 1. Input box should be linked to the step 2 input slides. Same goes for output. Output slides should be linked to step 3. Visulaization slide should be linked to Step 4.This SLIDE is to provide the user an option to go through only specific content from the animation
To view the protocol for submitting files, click on input. To view the protocol for retrieving and analyzing output files, click on output. To proceed to full animation click on the arrow.
Proceed to Full Animation
DATA GENERATION
VISUALIZATION
Step 1.a - Protein Molecular Interaction Network –Data Extraction
Audio Narration
1
5
34
2
Action Description of the action
Protein Microarray Chips
Scanned Slides
Protein Samples
Users can extract protein microarray data from Microarray Experiments. The normalized microarray data gives an insight into the regulation of the genes. This regulation is checked by studying the microarray data through Gene Expression Profile Data Analysis software. For a detailed insight into the Microarray Technique, study the OSCAR animation for Microarray Technologies.
Schematic for extracting the data for defined problem
Follow the animation. Re-draw the figures.
Step 1.b - Protein Molecular Interaction Network –Data Extraction
Audio Narration
1
5
34
2
Action Description of the actionProtein molecular interaction software are used to build and analyze networks of proteins, given their accession numbers. The networks are built by mapping input data to the software’s knowledgebase. Here, we explain with a list of proteins modulated in the disease condition called glioma, which are extracted from 1.literature resources.2.Microarray DatabasesAs an output we get a spreadsheet containing microarray data
PMID ACCESSION NUMBER
PROTEIN NAME
GLIOMA TYPE
VALIDATION FOLD CHANGE
p-VALUE
Extract Data from Literature sources and store it in a spreadsheet
Schematic for extracting the data for defined problem
The first panel is about extracting information from web resource. Show the required PDFs getting downloaded and read through to extract data. Follow this by a screen shot of Microarray databases. In the end show the “Raw.xls” file being formed.
Rawdata.xls
Literature Resource
Query Term High-Grade glioma
Extract data from Microarray Data repositories
Step 1.c - Protein Molecular Interaction Network –Data Extraction
Audio Narration
1
5
34
2
Action Description of the actionProtein molecular interaction software are used to build and analyze networks of proteins, given their accession numbers. The networks are built by mapping input data to the software’s knowledgebase. Here, we explain with a list of proteins modulated in the disease condition called glioma, which are extracted from literature resources or databases.
PMID ACCESSION NUMBER
PROTEIN NAME
GLIOMA TYPE
VALIDATION FOLD CHANGE
p-VALUE
Extract data from Microarray Data repositories
Schematic for extracting the data for defined problem
The first panel is about extracting information from web resource. Show the required PDFs getting downloaded and read through to store specific data in spreadsheets
Rawdata.xls
Step 2.a - Protein Molecular Interaction Network –Input
Audio Narration
1
5
34
2
CREATE PROJECT UPLOAD MAP DATA
Enter Project Name
Enter Experiment Type
Biomarker AnalysisCore AnalysisToxicology AnalysisMetabolic Analysis
Core Analysis
Action Description of the actionSchematic for Input
Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
The name of the project and experiments must be entered by the user in the software for the purpose of saving the current status of the work. In the experiment type, the user must select the type of analysis that needs to be conducted on the dataset. For this Glioma case study, we undertake core analysis of the data to identify its network.
Project Glioma
Step 2.b - Protein Molecular Interaction Network –Input
1
5
34
2
CREATE PROJECT UPLOAD DATA MAP DATA
Upload Excel File Folder1/Rawdata.xls
PMID Protein Name Accession Number Glioma Type17653765 Fructose bisphosphate aldolase 78070601 anaplastic oligodendroglioma17653765 Phosphoglycerate mutase 1 56081766 anaplastic oligodendroglioma17653765 Carbonic anhydrase ii 443135 anaplastic oligodendroglioma
Enolase 1 4503571 Glioblastoma multiformeEnolase 693933 Glioblastoma multiformea-Enolase like 1 3282243 Glioblastoma multiformeEnolase 1 4503571 Glioblastoma MultiformeAldolase C, fructose biphosphate P09972 glioblastoma,Grade II,III,IVEnolase 1 P06733 glioblastoma,Grade II,III,IVEnolase 2 P09104 glioblastoma,Grade II,III,IVGlyceraldehyde-3-phosphate dehydrogenase, liver P04406 glioblastoma,Grade II,III,IVLactate dehydrogenase B P07195 glioblastoma,Grade II,III,IVPhosphoglycerate kinase 1 P00558 glioblastoma,Grade II,III,IVPhosphoglycerate mutase 1, brain Q6P6D7 glioblastoma,Grade II,III,IVPyruvate kinase, isozymes M1/M2 P14618-2 glioblastoma,Grade II,III,IVPyruvate kinase, isozymes M1/M2, splice isoform M1 P14618 glioblastoma,Grade II,III,IVTriosephosphate isomerase P60174 glioblastoma,Grade II,III,IVPyruvate kinase NI Malignant GliomaGlyceraldehyde 3-phosphate dehydrogenase P04406 Malignant GliomaTriosephosphate isomerase P60174 Malignant GliomaEnolase 1 P06733 Malignant GliomaAldolase A NI Malignant Glioma
19109410 GAPDH P16858 Glioma gradeIII,IV19109410 Pyruvate kinase isozyme M1/M2 P52480 Glioma gradeIII,IV19109410 Alpha-Enolase P17182 Glioma gradeIII,IV19109410 Phosphoglycerate kinase 1 P09411 Glioma gradeIII,IV19109410 GAPDH P16858 Glioma gradeIII,IV
MENTION THE TYPE OF IDENTIFIER SUCH AS:
UNIPROT, GENEBANK ID, REFSEQ ID, ENTREZ GENE,
ETC
Step 2.c - Protein Molecular Interaction Network –Input
Audio Narration1
5
34
2
Action Description of the action
Schematic for Input Show the simulation of the
software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the next tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate colors.
Upload the Raw data file that was created after scrutinizing the papers. The format of the Raw data file to be uploaded varies amongst different software. Although most software recognize Spreadsheet format of data, some of them have their own specific input file format such as .sif file for Cytoscape. Once the raw data file is uploaded, the tool will display all columns. The user needs to select the columns that are to be given to the tool. Out of all the columns, it is compulsory to enter the ACCESSION NUMBER (OR ANY OTHER PROTEIN IDENTIFIER). This column is highlighted in red. These identifiers can be of multiple types, which need to be defined so that the tool can match the user’s data to its dictionary of identifier terms called the knowledgebase. All other information provided is optional and the users can provide them depending on the nature of analysis.
Step 2.d - Protein Molecular Interaction Network –Input
1
5
34
2
CREATE PROJECT UPLOAD DATA MAP DATA
PMID Protein Name Accession Number Glioma Type
17653765 Fructose bisphosphate aldolase 78070601 anaplastic oligodendroglioma
17653765 Phosphoglycerate mutase 1 56081766 anaplastic oligodendroglioma
17653765 Carbonic anhydrase ii 443135 anaplastic oligodendroglioma
Enolase 1 4503571 Glioblastoma multiforme
Enolase 693933 Glioblastoma multiforme
a-Enolase like 1 3282243 Glioblastoma multiforme
Enolase 1 4503571 Glioblastoma Multiforme
Aldolase C, fructose biphosphate P09972 glioblastoma,Grade II,III,IV
Enolase 1 P06733 glioblastoma,Grade II,III,IV
Enolase 2 P09104 glioblastoma,Grade II,III,IV
Glyceraldehyde-3-phosphate dehydrogenase, liver P04406 glioblastoma,Grade II,III,IV
Lactate dehydrogenase B P07195 glioblastoma,Grade II,III,IV
Phosphoglycerate kinase 1 P00558 glioblastoma,Grade II,III,IV
Phosphoglycerate mutase 1, brain Q6P6D7 glioblastoma,Grade II,III,IV
Pyruvate kinase, isozymes M1/M2 P14618-2 glioblastoma,Grade II,III,IV
Pyruvate kinase, isozymes M1/M2, splice isoform M1 P14618 glioblastoma,Grade II,III,IV
Triosephosphate isomerase P60174 glioblastoma,Grade II,III,IV
Pyruvate kinase NI Malignant Glioma
Glyceraldehyde 3-phosphate dehydrogenase P04406 Malignant Glioma
Triosephosphate isomerase P60174 Malignant Glioma
Enolase 1 P06733 Malignant Glioma
Aldolase A NI Malignant Glioma
19109410 GAPDH P16858 Glioma gradeIII,IV
19109410 Pyruvate kinase isozyme M1/M2 P52480 Glioma gradeIII,IV
19109410 Alpha-Enolase P17182 Glioma gradeIII,IV
19109410 Phosphoglycerate kinase 1 P09411 Glioma gradeIII,IV
19109410 GAPDH P16858 Glioma gradeIII,IV
Step 2.d - Protein Molecular Interaction Network –Input
1
5
34
2
The input raw data is mapped to the knowledgebase of the software to provide a uniform set of IDs for building a network. The IDs from the input file that are not matched with its knowledgebase are highlighted in red
Schematic for Input
This file is same as input file. Only the entries that are not mapped need to be highlighted as animation
Audio NarrationAction Description of the action
Step 2.e - Protein Molecular Interaction Network –Input
1
5
34
2
CREATE PROJECT UPLOAD MAP DATA
Data gets mapped to Knowledgebase of software to produce output files
ID Gene Description Location Family78070601 ALDOC* aldolase C, fructose-bisphosphate Cytoplasm enzyme
56081766 PGAM1* phosphoglycerate mutase 1 (brain) Cytoplasm phosphatase
4503571 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator
693933 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator
P09972 ALDOC* aldolase C, fructose-bisphosphate Cytoplasm enzyme
P06733 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator
P09104 ENO2 enolase 2 (gamma, neuronal) Cytoplasm enzyme
P04406 GAPDH (includes EG:2597)* glyceraldehyde-3-phosphate dehydrogenase Cytoplasm enzymeP07195 LDHB lactate dehydrogenase B Cytoplasm enzymeP00558 PGK1* phosphoglycerate kinase 1 Cytoplasm kinase
Q6P6D7 PGAM1* phosphoglycerate mutase 1 (brain) Cytoplasm phosphataseP14618-2 PKM2* pyruvate kinase, muscle Cytoplasm kinaseP14618 PKM2* pyruvate kinase, muscle Cytoplasm kinaseP60174 TPI1* triosephosphate isomerase 1 Cytoplasm enzyme
P04406 GAPDH (includes EG:2597)* glyceraldehyde-3-phosphate dehydrogenase Cytoplasm enzymeP60174 TPI1* triosephosphate isomerase 1 Cytoplasm enzyme
P06733 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator
P16858 GAPDH (includes EG:14433)* glyceraldehyde-3-phosphate dehydrogenasePlasma Membrane enzyme
P52480 PKM2* pyruvate kinase, muscle Cytoplasm kinase
P17182 ENO1* enolase 1, (alpha) Cytoplasmtranscription regulator
Step 2.e - Protein Molecular Interaction Network –Input
1
5
34
2
The tool also extracts other relevant information from its knowledgebase corresponding to that ID. The uniform IDs and the new columns are displayed in the form of a new spreadsheet which has the refined data. The columns highlighted in blue are the ones that are newly added. The red column is provided for uniformity by taking one specific naming scheme for identifiers.
Schematic for Input
Show the simulation of the software. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the next tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
Audio NarrationAction Description of the action
Step 3 - Protein Interaction Data Analysis - Output
Audio Narration
1
5
34
2
BUILD PATHWAY OUTPUT NETWORK OUTPUT PATHWAY
Action Description of the action
1 Genetic Disorder, Neurological Disease, Nucleic Acid Metabolism
2 Cell-To-Cell Signaling and Interaction, Nervous System Development and Function, Cellular Assembly and Organization
3 Cancer, Reproductive System Disease, Gastrointestinal Disease
TOP NETWORK FUNCTIONS TOP DISEASE NETWORK TOP PHYSIOLOGICAL NETWORK
1 Cancer2 Gastrointestinal Disease3 Neurological Disease
1. Nervous System Development and Function
2. Hematological System Development and Function
3. Immune Cell Trafficking
1. Glycolysis/Gluconeogenesis2. Mitochondrial Dysfunction 3. 14-3-3-mediated Signaling
TOP CANONICAL PATHWAY
The tools provide a summary of results which show the top networks produced in each category. The ranking is based on the number of mappings from user input dataset to software’s knowledgebase. The prediction of “Neurological Disease”, “Cancer”, “Nervous System” as top networks reinforce our data analysis. The data analysis from this tool also shows that “Glycolysis/Gluconeogenesis” is the pathway that is getting modulated from our list of proteins
Schematic for Output summary
Follow the animation
Step 4.a -Protein Molecular Interaction Network - Output
Audio Narration
1
5
34
2
BUILD PATHWAY OUTPUT NETWORK OUTPUT PATHWAY
Action Description of the actionUsers can modulate parameters which define the number and size of networks to be formed. Users can also modulate the presence of molecules apart from genes, proteins or RNA. The molecules that have shown relationships with other genes or proteins of the knowledgebase are mapped into the network. The IDs that are repetitive will point to the same node in the network
Select the number of networks to be constructed
1
Select the maximum number of Molecules in the network
70
Select endogenous chemicals No
Schematic for Output summary
Follow the animation
Step 4.b - Protein Molecular Interaction Network - Output
Audio Narration
1
5
34
2
BUILD PATHWAY OUTPUT NETWORK OUTPUT PATHWAY
Action Description of the actionSchematic for Output summary
Follow the animation. Highlight the yellow boxes in animation as well.
From the input given by users, the tool analyzes the set of molecules, which are present in its database of metabolic network. The molecules that are found to occur most frequently are used as seeds which connect to other such molecules. Networks are also extended based on interactions between two small networks to produce a larger network. Such analysis will depend on the parameters set by the user in the initial steps. Based on this information, the tool will predict the pathway to which the molecules are most likely to belong. Further analysis of these pathways can be carried out using metabolic profile databases.
Seed Molecules Molecular interaction Another Small Interaction NetworkNetwork interaction
α -D-Glucose-6P
β-D-Fructose-6P
β-D-Fructose-1,6-P2
Glyceraldehyde-3P
α-D-Glucose-1P
Glyceraldehyde-2P
Phosphoenol pyruvate
Starch and sucrose metabolism
Pentose Phosphate Pathway
Glycerone-P
β-D-Glucose
α -D-Glucose
Step 4.c - Protein Molecular Interaction Network - Visualization
1
5
34
2
http://www.ingenuity.com/
Step 3.d - Gene Expression Profile Data - Visualization
Audio Narration1
5
34
2
http://www.ingenuity.com/
Action Description of the actionZoom effect Animator needs to re-draw all
screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof, in the animation. Show the image with each part zooming and then coming as a zoomed image.
The pathway information relevant in Gliomas Studies, from the input data, can be extracted. In this pathway, we can observe the role of Isocitrate Dehydrogenase (IDH), in regulation of metabolism during Glioma. Recently a published study has also shown the involvement of IDH in Gloma related pathways. Most such software are linked to Protein Pathway Interaction Software, which are described in detail in the next part of the animation.
Master Layout (Part 3)
5
3
2
4
1This animation consists of 3 parts:Part 1: Gene Expression Data AnalysisPart 2 : Protein Interaction Data AnalysisPart 3: Metabolic Profile Databases
Select the level of organization of the biological system to study
Select from one of the publicly available databases
Select the relevant options in the database to view the pathway network and interaction data of the system under consideration
http://www.genome.jp/kegg/
Definitions of the components:Part 3 – Metabolic profile databases
5
3
2
4
11. Biological System: In the biological context, a system refers to an entity that exists
with the help of mutual interactions between its components.
2. Level of organization: The level of organization describes the complexity of the biological system being studied. Components of one system could be made up of constituent parts, which in turn form another system at a different level of organization. For example, a cell is a system in itself. However for larger physiological systems, a cell would only be a component within it.
3. Visualization: To explore various protein-protein interactions, it is critical to percept lists of protein interaction data, which is retrieved as elaborate spreadsheets that make the analysis cumbersome. Mapping of such data in a diagrammatic form makes it easier for scientists to develop a biological insight into the interaction data.
4. Functional annotation: By examining the maps of protein–protein interaction data, researchers can discover new biological relationships between proteins or predict their functions based on specific interactions.
5. Graphical Notation: The first step in the analysis of protein interaction data is the identification of protein complexes and groups of complexes. In a simple graphical notation, a “Node” represents a protein while the “Edges” represent the interaction between the two proteins.
5
3
2
4
16. Pathway: A pathway in Biology refers to a series of inter-related metabolic
reactions, which depicts the order of conversion of one entity to another.
7. Meta node: It is a single node onto which all members of a protein cluster are collapsed. These meta nodes help in deciphering biological applications of the networks which are collapsed as one.
Definitions of the components:Part 3 – Metabolic profile databases
Step 1: Pathway Databases – Input
Action Audio Narration
1
5
34
Description of the action
Choose the system
ORGANISMENZYMESDISEASEPATHWAY
2METABOLISM
GENETIC INFORMATION PROCESSING
ENVIRONMENTAL INFORMATION PROCESSING
CELLULAR PROCESSES
ORGANISMAL SYSTEMS
CANCER
IMMUNE SYSTEM DISEASE
NEURO DEGENERATIVE DISEASE
CARDIO-VASCULAR DISEASE
METABOLIC DISEASES
INFECTIOUS DISEASES
ENZYME NAME
EC NUMBER
SYNONYMS
PROKARYOTES
PROTISTS
FUNGI
PLANTS
ANIMALS
Animation of the Input search strategies for Pathway databases
Follow the steps in the animation. Re-draw images. The audio narration must be read, as the cursor in the animation moves to the 4 headings of the web-page
The pathway databases are repositories to gain a visual insight into the biological interaction of genes and proteins. The general features of these databases include searching by1.Pathway: The entire network information in the web based database can be searched by selecting the metabolic pathway of interest, such as cellular processes, genetic information flow, etc.2.Diseases: Here all the networks are grouped based on the diseases which are caused by their modulation.3.Enzymes: The enzymes belonging to the pathway database are grouped and the pathways can be searched by giving their enzyme information as a query.4.Organism: All organisms are given a unique identifier. Users can also select the organism, and then study the pathway as it occurs in those organisms.http://www.genome.jp/kegg/
CANCER
IMMUNE SYSTEM DISEASE
NEURO DEGENERATIVE DISEASE
CARDIO-VASCULAR DISEASE
METABOLIC DISEASES
INFECTIOUS DISEASES
Step 2.a - Pathway Databases – Visualization of Pathways for Glioma
Action
1
5
34
2
Zoomed Images Animator needs to re-draw all screen shots as they have been taken from the references software. Animator must not copy the image or a part thereof., in the final animation. Display Image. Highlight the “nodes” and “edges” as shown in animation. The red box zooms to show the area of the network which is getting zoomed into. This is followed by the zoomed image of that part of the network. Each zoomed image is followed by the narration in the order given.
We use pathway databases to study one of the pathways from our Glioma studies in Protein Interaction Networks, namely “Cell Cycle, Cellular Assembly and Organization, DNA Replication, Recombination, and Repair”. Here we highlight the nodes and edges within the pathway. Here the nodes are the corresponding gene and the edges are interaction between them. Users can also find images from such visualization tools for specific gene interaction such as in this case we depict the interactions of TP53, derived from Glioma studies.
http://www.ingenuity.com/, http://www.cytoscape.org/
Nodes
Edges
Audio NarrationDescription of the action
1
5
34
2
http://www.genome.jp/kegg/
Step 2.c - Pathway Databases – Interpretation
ORTHOLOGY ENZYME REACTION
Definition: glucose-1-phosphate phosphodismutase
Pathways it belongs to: Pathway_ID1: Glycolysis / GluconeogenesisPathway_ID2: Starch and sucrose metabolism
Genes involved: Gene_ID123: BSU Gene_ID273: BLIGene_ID987: BLDGene_ID789: SPZ
Action Description of the action Audio NarrationOptions given once you click on a particular entity of Pathway
Pathways can also be pbtained for protein interaction networks. In such networks, the metabolites are the “nodes” and the reaction between them are the “edges”. Each node such as a substrate, reactant or an enzyme is hyper-linked to another page which gives the detailed information about the particular entity. Each element of the pathway including the pathway itself is assigned an identifier for the purpose of referring to it from anywhere in the database. It also gives all the information related to the molecule or reaction such as its orthology, the pathways it belongs to and the corresponding gene IDs.
In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
Action Audio Narration
1
5
34
Description of the action
2
http://www.genome.jp/kegg/
Step 2.d - Pathway Databases – Interpretation
ORTHOLOGY ENZYME REACTION
Enzyme Commission Number: 5.4.2.2
Class of Enzyme: Transferases Transferring phosphorus-containing groups Phosphotransferases with an alcohol group as acceptor
Substrate: D-glucose 1-phosphate
Products: D-glucose, D-glucose 1,6-bisphosphate
Options given once you click on a particular entity of Pathway
It also gives all the enzyme related information for the reaction such as the Enzyme nomenclature, Enzyme Commission Number, Class of Enzyme, substrates and products.
In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
Action Audio Narration
1
5
34
Description of the action
2
Step 2.e - Pathway Databases – Interpretation
ORTHOLOGY ENZYME REACTION
Metabolic Reaction: Metabolism; Carbohydrate Metabolism; Glycolysis / Gluconeogenesis 2 D-Glucose 1-phosphate <=> alpha-D-Glucose 1,6-bisphosphate +alpha-D-Glucose
http://www.genome.jp/dbget-bin/www_bget?R00960+RP00303+RC00078
Options given once you click on a particular entity of Pathway
Re-Draw the equation. In each slide, the tab that is high-lighted is ACTIVE. In the animation format, the tab should highlight when you click on it followed by the content of the slide. Then the mouse should move to the second tab and click on it leaving the first tab inactive and second tab active. Activity of tabs can be differentiated by separate Colors
The metabolic reaction that the enzyme is involved in is also provided in its equation form along with structures of reaction substrates.
Interactivity option 1:Step No: 1 - Assignment
Boundary/limitsInteractivity Type Options Results
1
2
5
3
4
.gal Files
Gene Expression Profile Data
Input
Protein Interaction Data
Input
Metabolic Profile Data
Input
.cel Files
Name of Enzyme
Name of PathwayList of Protein Identifiers
Name of Disease
.gpr Files .sif Files .cdt Files
Drag the yellow buttons into one amongst the 3 Analysis Tools. The correct results are given in the next slide
Type of Input Data
Type of Analysis Tools
If the user drags it into the right box, the animation should flash a “Tick” Sign. If the box is incorrect, flash a “Cross” Sign and ask the user to “Try Again”
Drag and Drop.
Interactivity option 1:Step No: 2 -RESULTS
Boundary/limitsInteracativity Type Options Results
1
2
5
3
4
Gene Expression Profile Data
Input
Protein Interaction Data
Input
Metabolic Profile Data
Input
Name of Enzyme
Name of Pathway
List of Protein Identifiers Name of Disease
.gal Files
.cel Files
.gpr Files
.cdt Files
.sif Files
Drag the yellow buttons into one amongst the 3 Analysis Tools. The correct results are given in the next slide
If the user drags it into the right box, the animation should flash a “Tick” Sign. If the boox is incorrect, flash a “Cross” Sign and ask the user to “Try Again”
Drag and Drop.
Questionnaire - 11
5
2
4
3
1. Which amongst these is not a feature of a Protein network?a. Edgesb. Nodesc. Metanodesd. Antinodes
2. What are the results of Gene Expression Analysis?a. Heat Mapb. Fold Changec. P-valued. All of the Above
3. Protein Pathways can be studied using?a. Stand-alone toolsb. Web-based toolsc. Bothd. None
Questionnaire - 21
5
2
4
3
4. Which is a mandatory entry to study Protein Interaction Pathways?a. Fold Changeb. p-Valuec. Unique Identifier like Accession Numberd. All of the Above
5. In case of Gene Expression Data Analysis, Heat Map represents?a. Significance of the Geneb. Fold Changec. p-valued. Gene Ontology
6. Which amongst these is a valid Microarray File Extension?a. GALb. GPRc. CELd. All of the Above
Links for further reading
Books
Systems Biology: An Approach P Kohl1, EJ Crampin2, TA Quinn1 and D Noble1
An introduction to Systems Biology: Design Principles of Biological Circuits by Uri Alon June 2006,
Chapman&Hall/CRC, Taylor and Francis Group
Introduction to Systems Biology Choi, Sangdun (California Institute of Technology) July 2007,
Humana Press
Research Papers
Visualizing biological pathways: requirements analysis, systems evaluation, and research agenda. Saraiya, P., North, C. & Duca, K. (2005).
Tools for visually exploring biological networks. Suderman, M. & Hallett, M (2007).
A survey of visualization tools for biological network analysis. Pavlopoulos, G.A.G., Wegener, A.L.A. & Schneider, R.R. (2008).
Visualization of omics data for systems biology Nils Gehlenborg, Seán I O’Donoghue, Nitin S Baliga, Alexander Goesmann, Matthew A Hibbs, Hiroaki Kitano, Oliver Kohlbacher, Heiko Neuweger, Reinhard Schneider, Dan Tenenbaum & Anne-Claude Gavin. Nature (2010)
Links for further reading
Webliography
http://www.genome.jp/kegg/
http://www.chem.agilent.com/Library/usermanuals/Public/GeneSpring-manual.pdfhttp://www.moleculardevices.com/pages/software/gn_genepix_pro.html
http://www.cytoscape.org/
http://www.ingenuity.com/
http://www.genego.com/metacore.php
http://www.ece.cmu.edu/~brunos/Lecture3.pdf
http://pathways.embl.de/
http://www.biocyc.org/
http://www.arena3d.org/
http://spotfire.tibco.com/
http://www.bioconductor.org/
http://www.chem.agilent.com/en-US/Products/software/lifesciencesinformatics/genespringgx/pages/gp34727.aspx
http://www.cytoscape.org/download.php
Links for further reading
Following URLs are used for animations
http://www.genome.jp/kegg/
Biochemistry by A.L.Lehninger et al., 3rd edition
http://www.ingenuity.com/
http://www.cytoscape.org/
http://www.genome.jp/dbget-bin/www_bget?R00960+RP00303+RC00078
http://www.genego.com/metacore.php
http://www.ece.cmu.edu/~brunos/Lecture3.pdf
http://pathways.embl.de/
http://www.chem.agilent.com/Library/usermanuals/Public/GeneSpring-manual.pdfhttp://www.moleculardevices.com/pages/software/gn_genepix_pro.html