David Amar davidama/bioinfo_tutorials.

86
Cytoscape and networks David Amar http://tau.ac.il/~davidama/bioinfo_tutorials

Transcript of David Amar davidama/bioinfo_tutorials.

  • Slide 1

David Amar http://tau.ac.il/~davidama/bioinfo_tutorials Slide 2 Network biology Overview: systems biology Represent molecular entities Represent interactions Two main data types Pathways Interaction networks Slide 3 Biological interaction networks Nodes: genes or other molecules Edges: evidence for some interaction can contain weights, directions Magtanong et al. 2011 Nature Slide 4 Biological interaction networks Nodes: genes/proteins or other molecules Edges based on evidence for interaction Voineagu et al. 2011 Nature Breker and Schuldiner 2009 Gene co-expressionProtein-protein interaction Genetic interaction 4 Slide 5 Cytoscape Cytoscape is an open source software for integrating, visualizing, and analyzing networks. This tutorial describes the Cytoscape 3 user interface. Outline Basics Load and visualize data Customize Applications Clustering Enrichment analysis GeneMANIA Modmap Gene expression analysis Slide 6 Slide 7 Initial window The toolbar, contains command buttons, the name is shown when the mouse pointer hovers over it. Main Network View, initially blank. Control Panel: lists the available networks by name Network Overview Pane Table Panel: can be used to display node, edge, and network table data Slide 8 Load data: import from databases Slide 9 The initial window enables searching in the big public databases Slide 10 Load data: import from databases Search example: by gene name Choose databases Slide 11 Import result The imported networks by name Basic statistics Slide 12 Look at a network The toolbar, contains command buttons, the name is shown when the mouse pointer hovers over it. Main Network View Control Panel: lists the available networks by name Network Overview Pane: move around! Table Panel: displays node, edge, and network table data Slide 13 Search for a gene Information about the marked nodes Slide 14 Load data: import all interactions Slide 15 Slide 16 Import result The new network Slide 17 Load data: from files We sometimes have our own data From papers A special search in a database Our experiment (e.g., correlation between genes) Famous formats SIF A table OWL for pathways, complex text But easy to get and very informative once uploaded Slide 18 Load from files Slide 19 Contains an interaction network of 331 genes from Ideker et al. 2001 Science Slide 20 Load data: from SIF files Text: name1 interaction_type name2 Slide 21 Load data: from a table From excel files or tab-delimited text tables Slide 22 Load data: from a table Slide 23 Set where to look for the nodes and the type Slide 24 Load data: from a table OPTIONAL: Click on the columns that you want to be kept as attributes Slide 25 Result Slide 26 Load data: OWL Good for looking at pathways This example: data from the Reactome database Slide 27 Load data: result Directed edges: signaling Slide 28 Zoom Slide 29 Focus on a selected region (nodes in yellow) Slide 30 Zoom: result Move around Slide 31 Get a sub-network Slide 32 The sub- network was created below the original network Slide 33 Save the session We imported six networks Before we start modifying them lets save the session File -> Save Sanity check: close Cytoscape and load the session! Slide 34 Remarks At this point we know to load data from databases and files We can perform simple navigation, zoom and save We saved different networks each its own visualization rules A good habit that saves troubles: save a session for each visualization type Multiple networks, but keep a consistent visualization Slide 35 Modifying and saving a visualization Cytoscape supports countless options Layouts Node size, color, label Edge width, line type We will show main examples that are enough to start To save the graph as an image: Slide 36 Change the layout Slide 37 Organic layout Slide 38 Circular layout Places all of the nodes in a circular arrangement. Very quick Partitions the network into disconnected parts and independently lays out those parts. Slide 39 Force-directed Uses physical simulation that models the nodes as physical objects and the edges as springs connecting those objects together. Slide 40 Change layout scale Slide 41 Change the scale Before: scale is 1 Scale is 8 Slide 42 Style Open and modify Slide 43 The IntAct netowrk: node color Slide 44 Node color Each column represents some information that we have Discrete: set a value for each type of information Slide 45 Slide 46 Apps Cytoscape also has many tools called apps Install by going to Apps -> App Manager Applications support Advanced analysis Biological analysis Integrating data Import special data Slide 47 I) Find and annotate dense areas Use an app that clusters the network Biological assumption We look for protein communities Many interactions within Probably share function Gene function prediction Slide 48 Step 1: remove duplicated edges Sometimes nodes are linked by more than one edge Multiple evidence for interaction Remove them for clustering and simpler visualization Slide 49 Step 2: use ClusterViz Slide 50 Step 3: look at the results All clusters Sorted by size Select a cluster Slide 51 Step 3: look at the results Slide 52 Step 4: biological function? We discovered a cluster A set of highly connected proteins What biological processes/functions are enriched in this cluster? Discover significantly over-represented biological functions Compared to creating random clusters Slide 53 Step 4: BINGO Select all nodes (Ctrl+A) Slide 54 Step 4: BINGO Give the cluster a name (Cluster 1) Select human Slide 55 Step 4: Results Summary tableGO graph Only correted p-values matter!!! Mark in the network Slide 56 II) Analyze a gene set We have a set of genes we want to interpret From papers From data analysis We want to discover Functional enrichments How they interact within themselves and similar genes Use GeneMANIA Slide 57 Resources and installation Installing GeneMANIA may take >30 minutes Steps 1. Apps -> Apps Manager 2. Install GeneMANIA 3. Open GeneMANIA (Apps->GeneMANIA) 1. Confirm data download 2. A new window will open: select human for this tutorial Slide 58 GeneMANIA Our input: a set of genes from Hauser et al. 2005 ( http://archneur.ama-assn.org/cgi/pmidlookup?view=long&pmid=15956162 ) http://archneur.ama-assn.org/cgi/pmidlookup?view=long&pmid=15956162 HSPA1B, HSPA1A, DNAJC6, DNAJB2, UBE1, PARK5, SLC25A5, COX5B, COX6C, NDUFA3, ATP5I, HK1, COX4I1, ATP1B1, COX6B, SLC25A3, NDUFS5, ATP5O, UQCRH, ATP5C1, NDUFB8, ATP5G3, ATP5C1, VDAC3, COX4I1, COX7B, NDUFA9, ATP1B1, ATP6V0A1, ATP6V0D1, ATP6V0C, ATP6V1B2, SLC9A6, ATP61P1, ATP6V1D, ATP6V0B, ATP6V1A1, ATP6V1E1, GDI1, STXBP1, SYT1, VAMP1 Slide 59 GeneMANIA: input window Paste here the gene names (or ids) separated by spaces (no commas) Slide 60 GeneMANIA: input window Slide 61 The recognized genes and their full names The type of the supported networks For each interaction type there is a list of networks that can be marked Slide 62 GeneMANIA: input window Use physical interactions, pathways and co-expression for our example Slide 63 Results Information tables. For example: the detected functions The output network. Grey nodes are new genes that were added to improve the connectivity Slide 64 Results Mark a function: automatically marks the relevant nodes Layout was modified to organic for better visualization Slide 65 VS. Slide 66 Highlight specific interactions Slide 67 Slide 68 III) Analyze different interaction types Positive expected within families Negative expected between families Some networks contain both VS. Members of protein complex Members of parallel pathways Slide 69 Analysis of network pairs Interactions types can differ: within (positive) vs. between (negative) functional units Input: networks H,G with same vertex set Goal: summarize both networks in a module map Node module: gene set highly connected in H Link two modules highly interconnected in G Between-pathway models Kelley and Ideker 2005 Ulitsky et al. 2008 Kelley and Kingsford 2011 Leiserson et al. 2011 69 Slide 70 Solution: ModMap Cytoscape app: under construction Currently: run the command line tool and upload to Cytoscape as a solution We will show how to upload a solution Slide 71 Load ModMap analysis Our example: combined analysis of yeast PPI and GI data Find GI among complexes 1. Load the network: type interaction types 2. Load the association of nodes to modules 3. Color the results and the set layout Slide 72 Load the network Load the YeastData.xlsx file Important, we have several types Slide 73 Load the network Load the YeastData.xlsx file The network is large, we tell Cytoscape to generate it Slide 74 Load a clustering solution Modmap_modules.txt file format (text file): Node module_name Import Table: a way to add external information about the nodes Slide 75 Load a clustering solution Right click and give it a name Slide 76 Load a clustering solution Right click and give it a name Slide 77 Load a clustering solution Slide 78 Layout a clustering solution Slide 79 Layout a clustering solution: results Unclustered nodes A circle for each cluster Slide 80 Remove unclustered nodes Mark the selected nodes and create a sub-network Slide 81 Remove self and duplicated edges Slide 82 Zoom in on a part of the solution Not informative enough, we cannot see edge types Slide 83 Change the visualization style Slide 84 Slide 85 Slide 86 IV) Overlay gene expression data Class/Home exercise (data in the exp_data directory) Load human PPI Load gene fold-change in a gene expression experiment Set node color and size by the fold change Play with the layout For example, group attribute layout Run BINGO on a selected sub-network