Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

46
Protein-Protein Interaction Network Gautam Chaurasia 08.07.04

Transcript of Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Page 1: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Protein-Protein Interaction Network

Gautam Chaurasia 08.07.04

Page 2: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 2

Overview

Introduction.

Three Different Models:

Structure of the protein-protein interaction network. Non-power law.

Evolutoin of the network. Power Law Random Graphs.

Detection of functional modules from protein interaction networks.

Clustering algorithm

Page 3: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 3

Introduction

The network is viewed as a graph whose nodes correspond to proteins. Two proteins are connected by an edge if they interact.

The collection of all interactions between the proteins of an organism is called interactome.

The Y2H system (yeast-two-hybrid) is used to yield a comprehensive map of protein-protein interaction network.

The network resembles a random graph in that it consists of many small subnets (groups of proteins that interact with each other but do not interact with any other protein) and one large connected subnet comprising more than half of all interacting proteins.

Page 4: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 4

Structure of PPI Network

Yeast protein interaction network. (Uetz et al. 2000)

A: A two-dimensional drawing of the entire network.

B: The giant (hub) component of this graph consists of 466 proteins.

C: A small section of the hub component, with gene or open reading frame names shown next to each node.

Page 5: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 5

Structure of PPI Network

Degree: Described by the connectivity k of the node, which tells us

how many links the node has to other nodes.

Degree distribution: The degree distribution p(k), gives the probability that a

selected node has exactly k links.

P(k) is obtained by:

Page 6: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 6

ER Random Graph

ER Random Graphs: An ER random graph consists of n nodes and k edges,

where any pair of nodes is equally likely to be connected by one of the k edges.

Start with a given number of nodes and add links randomly.

which creates a graph with approximately pN(N–1)/2 randomly placed links.

The node degrees follow a Poisson distribution.

Page 7: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 7

Scale-Free Network

Scale-free networks: Rich getter richer. Scale-free networks are characterized by a power-law

degree distribution; the probability that a node has x links follows,

where γ > 0, so that a plot of log(degree) by log(frequency) shows a decreasing linear trend.

Page 8: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 8

Model-I: Non-Power Law-I The essence of this model is to observe that parts of proteins,

called domains, contains sites into which complementary parts of other protein can bind.

These complementary parts are referred to as positive and negative aspects of domain.

Bipartite sub-graph-graphs comprising two disjointed sets of nodes in which each node in one set is connected to every node in the other set.

Fig 2: In this figure, a particular domain for which the positive form is present in three proteins A, B, and C, and whose negative form is in four proteins W, X, Y, Z.

Page 9: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 9

Model-I: Non-Power Law-II

We assume that there are n proteins and m domains with a negative and positive form.

A domains may be any of the 2m types; 1+, 1-, 2+, 2-,....,m+, m-.

Each of the n proteins contains each of the 2m possible domains with constant probability p.

Let Xi be the number of domains that the ith protein has is distributed binomially:

All the Xi are independent and identically distributed.

Thus, the average number of sites per protein = 2mp.

Page 10: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 10

Model-I: Non-Power Law-III

Let Yi be the number of interactions of the ith protein. So the probability that any other protein j will not connect to i

only if it does not contain any of the x complementary domain aspects.

Since there are n-1 such proteins, we have:

Where q = (1-p). Hence, the unconditional distribution of Yi is a binomial mixture of binomials

Page 11: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 11

Model-I: Non-Power Law-IV

By Using Inclusion-Exclusion property-type expression we get:

Binomial distribution: An experiment with a fixed number of independent trials, each of which can only have two possible outcomes.

For example: Tossing a coin 20 times to see how many tails occur.

Inclusion-Exclusion: Let A denote a finite set and let P1, ...,Pn be any given properties. We want to express the number of elements of A which have none of these properties in terms of numbers of elements which have some of these properties.

Page 12: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 12

Log-log plot of the distribution

f(y) is plotted for n=6000 proteins, m=1000 domains and λ = 1,2.

The resulting graph shows clear non-linearity.

Fig: Log–log plot of the distribution of vertex degrees in the modelled interactome with 6000 proteins, 1000 domains and an average of 1 or 2 domains per protein, shown as solid and dotted lines respectively

Page 13: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 13

Degree distribution of sampled sub-graphs

A total of 450 proteins were sampled at random. The mean number of neighbors for each protein in this sample was 5. The resulting graph has approximately the same number of vertices

and edges as the Uetz datasets.

Fig:The Ito and Uetz datasets are plotted in black and blue, respectively. A straight line (power law) fit is shown as a dotted line. The distribution is obtained by sampling from this model with 6000 proteins, 1000 domains and an average of 1 domain per protein is plotted in red.

Page 14: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 14

Degree distribution of sampled sub-graphs

A total of 1500 proteins were sampled at random. The resulting graph shows the fit of this model to datais

better than power law.

Fig: The DIP dataset is plotted in black. The distribution obtained by sampling from this model with 6000 proteins, 1000 domains and an average of 2 domains per protein is plotted in red. A straight line (power law) fit is shown as a dotted line.

Page 15: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 15

Conclusions

The degree distribution predicted by this model fit the data better than do power law distribution.

This model fits better to the subnet as compared to the power law

Page 16: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 16

Example

This model can be used to infer the existence of interactions not yet detected experimentally, by using the predicted bipartite structure of sub-graphs.

In this figure strongly suggests that o-Raf1, PLC-, RALGDS, AF-6, RLF andSUR-8 contain a motif that interacts with a complementary motif in R-Ras, Rap1A, KRAS2B, RIN, RIBB, N-Ras and H-Ras. This would imply that for instance RLF and AF-6 should interact with Rap1A and R-Ras in order to complete the bipartite graph.

Page 17: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 17

Model-II

The Yeast Protein Interaction Network Evolves Rapidly and Contains Few Redundant Duplicate Genes.

Page 18: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 18

Evolution of Function

Examples:

Partially redundant duplicates CLN1/2/3: Involves in regulation of activity of yeast cyclin

dependent kinase. Ks = 2.4, over 200 Myr. TPK1/2/3: Catalytic subunits of yeast cyclic AMP-dependent

protein kinase. Ks = 1.31

Diverged gene function EDN vs. ECP: EDN has high RNAse activity, act as antivetroviral

agent,whereas ECP is an antibacterial toxin exertings.

dopa carboxylase and amd: Duplicates are expressed in different parts of the cell, therefore having different biological functions.

Page 19: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 19

Objective

Two main questions are addressed in this model are:

At what rate does functional divergence occur after gene duplication for a large sample of duplicated gene in genome?

Which effects have the products of the duplicated genes in the protein-protein interaction network?

Page 20: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 20

Data for Analysis The required information on protein-protein interaction

data comes from a large experiment (Uetz et al. 2000) using the yeast two-hybrid system (Field and Song 1989).

985 proteins, 899 interactions. 45 self intearctions.

Data for duplicated genes were obtained from the University of Oregon and described by using the fraction Ks .

Ks is the measure of the similarity between two genes. Only those genes pairs were considered for further analysis

whose Ks < 5 cutoff. There were such 9,059 pairs among ~6,000 genes with Ks

< 5.

Page 21: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 21

Power Law Random Graphs-I

PL random graphs are random graphs whose degree probability distribution P(d) is proportional to d-for some constant

First, n = 6279 isolated nodes were generated, and a random integer d > 0 was assinged to these node.

This random number d was generated in the following way,

where r is a random real number uniformly distributed in the interval (0, 1), and , is a constant.

Page 22: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 22

Power Law Random Graphs-II

Second, this number d was accepted with probability d-

The resulting distribution of d is a Power law with an weighing function.

If d was discarded, a new d was generated according to same prescription, and this process was repeated untill a d was accepted

Once d was accepted, it was assigned to the randomly chosen node.

Page 23: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 23

Power Law Random Graphs-III

Another node was chosen at random (without replacement of the previous chosen node), an integer d was assigned to it in same way, and this process was repeated untill the sum S of all the integers assigned to the chosen nodes first exceed 2k, where k is the number of edges.

The integer assigned to each node correspond to the node‘s degree.

Nodes were connected as per the number of edges and this was done untill the number of edges is S/2 = k.

Page 24: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 24

Interaction Network vs. Random Graph

Comparison of protein contact network (n = 985 nodes, k = 899 edges) with random graphs.

The PPI network has an excess of proteins with degree 1, but fewer proteins with a higher degree than the ER Random graph.

Whereas degree distribution of PPI network is consistent with the Power Law Random graph.

Page 25: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 25

Duplications and Interactions

This figure illustrate the effect of gene duplication on gene products involved in protein interactions.

Page 26: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 26

Divergence of Interactions

20% of duplicate gene pairs share an interaction partner with 0.5 < Ks < 1.0, whereas 80% of genes have no common interaction partner with their duplicates approximately 100 Myr after duplication.

Ks > 2 approaches the value expected for randomly chosen gene pairs.

200-300 myr

The histogram of the fraction of duplicates genes whose products have at least one interacting protein in common as a function of Ks.Intercation turn over every 200-300 Myr

Page 27: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 27

Divergence of Interactions

Only 57% of the most closely related duplicate gene pairs (0<Ks<.5) for which both genes interact with other proteins share any protein interaction partner in the same subnet.

For 380 gene pairs with Ks > 0.5 the fraction of duplicate partners with shared interaction is < 20%.

Ks > 1.5 is close to the random expected value.

Page 28: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 28

The Rate of Interaction Loss

The divergence in protein interaction after gene duplication is largely due to interaction loss.

127 pairs with KS < 2, where both duplicates engage in protein-protein interaction network.

920 interactions were present after duplication. 429 of which have been lost since at the rate of 2.3e-

3/Myr.

Is this estimate low or high? interaction data noise leads to overestimates. young pairs and double-losses lead to underestimates.

Page 29: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 29

Divergence of Self-interactions

Loss or gain of interactions between a pair of paralogs due to self-interaction.

Self-Interactions and interactions between products of duplicate genes.

Page 30: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 30

Divergence of Self-interactions

Total of 25 paralogs.

Only few conserved self-interactions was found.

New interactions

13/25 new interactions at the rate of 2.88 x 10-6 /Myr per pair Ks = 1 corresponds to 100 Myr.

Page 31: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 31

Conclusions

Protein-protein interaction network shows a power-law degree distribution.

Total 6280 ORF in yeast genome with 1.97 x 107 possible pair- wise interactions.

New interactions forming at slow rates/pair, and evolved at a rate of 2.88x10-6 per protein pair per million year.

Extrapolating the above estimate to entire yeast proteome would thus yield (1.97 x 107 x 2.88x10-6) = 57 newly evolved interaction per million years.

Page 32: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 32

Model-III- Cluster Analysis

Detection of Functional Modules from Protein Interaction Networks of S.cerevisiae.

Page 33: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 33

Cluster Analysis

CA is an obvious choice of methodology for the extraction of functional modules from protein interaction networks.

Clustering is defined as the grouping of objects based on their sharing discrete, measureable properties.

In functional genomics, clustering algorithm have been devised for multiple tasks, such as mRNA expression analysis and the detection of protein families.

The aim of this model is to detect biologically meaningfull patterns in the entire known protein interaction network of S.cerevisiae.

Page 34: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 34

Clustering Algorithm

The protein interaction data were obtained from DIP database.

The network of proteins is first transformed into a weighted graph.

The weights attributed to each intearaction reflect the degree of confidence level, represented by the number of experiments that support the interactions.

The score of 3.0 was assigned for the first instance of interaction, and increased by 1 if the interaction supported by another method or 0.25 if the interaction had already been observed by that method.

Page 35: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 35

Clustering Algorithm

The resulting graph is weighted network of proteins connected by edges.

Now this weighted graph is converted into a line graph L(G), in which edegs now represent nodes and nodes represent edges.

Page 36: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 36

Clustering Algorithm The scores for the original

constituent interaction are then averaged and assigned to each edge.

The TribeMCL software, an algorithm for clustering graph, was used to cluster the interaction network and recover cluster of associated interactions.

These clusters range in size from 2 to 292 components (average size is 8.05), and form a scale-free protein network.

Page 37: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 37

Results

Total of 1046 clusters were obtained.

In this analysis, each protein was on average present in 2.1 clusters.

Only 76 interactions and 146 proteins (represent only < 1%

of total data), which were weakly connected to the main interaction network, were discarded by the clustering method.

The found Clusters were classified in three categories according to the functional involvement of proteins in different machanism.

KEGG: regulatory and metabolic classifications (20%). GQFC: Genequiz automatic functional classification (45%). MIPS: Cellular localization (48%).

Page 38: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 38

Validation of the Clustering Method-I

Scoring the cluster: Cluaters are validated by assesing the consistency of protein classification within an individual cluster.

This is measured, for each of three classifiaction schemes, by calculating the redundancy of each cluster j

Rj = redundancy (Rj) of each cluster j. n = represents the number of classes in the classification

scheme, Ps = represents the relative frequency of the class in cluster

j, The numerator represents the information content in bits

given by entropy (H), The denominator is a normalizing factor representing the

maximum entropy for the cluster j (Hmax).

Page 39: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 39

Validation of the Clustering Method-II

Fig.: Module validation using biological classification schemes

Page 40: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 40

Validation of the Clustering Method-III

Fig.: Module validation using biological classification schemes

Page 41: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 41

Validation of the Clustering Method-IV

Fig.: Module validation using biological classification schemes

Page 42: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 42

Example-I Cluster 55

Here, cluster 55 recovers a set of protein interactions (inset) that are involved in vaculor transport and fusion from ER via pre- vacuolar compartment.

Page 43: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 43

Examples-II clusters 32 and 86

Recovery of signal transduction pathway controlling cell wall biogenesis, from the membrane protein (Fks1) to the trancription factors activated by this pathway (Swi4, Swi6 and Rlm1).Pathway was recovered as a set of two clusters connected by two proteins (Pkc1p and Smd3p), shows one-to-many relationship.

Page 44: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 44

Network of functional modules

This graph shows the connection between 40 functional modules connected by shared proteins.

Page 45: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 45

Conclusions

This model can be used to predict poorly characterized proteins into their functional context according to their interacting partners within a module.

The predictve power of this model allows us to examine the organization and coordination of multiple complex cellular processes and determine how they are organized into pathways.

One-to-many relationship can be used for pathway discovery.

Page 46: Protein-Protein Interaction Network Gautam Chaurasia 08.07.04.

Seminar on Protein-Protein Interaction Network 46

References

On the structure of protein–protein interaction Networks A. Thomas, R. Cannings, N.A.M. Monk, and C. Cannings. Biochemical Society Transactions (2003) Volume 31, part 6.

The Yeast Protein Interaction Network Evolves Rapidly and Contains Few Redundant Duplicate Genes. Andreas Wagner; Mol. Biol. Evol. 18(7):1283–1292. 2001.

Detection of Functional Modules From Protein Interaction Networks Jose B. Pereira-Leal,1 Anton J. Enright,2 and Christos A. Ouzounis1 PROTEINS: Structure, Function, and Bioinformatics 54:49–57 (2004).