The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of...

21
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed Ibrahim (King’s College, London, UK) Alioune Ngom (University of Windsor, Windsor, Canada)

Transcript of The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of...

Page 1: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

1

The Relative Vertex-to-Vertex Clustering Value

A New Criterion for the Fast Detection of Functional Modules in Protein Interaction

Networks

Zina Mohamed Ibrahim

(King’s College, London, UK)

Alioune Ngom

(University of Windsor, Windsor, Canada)

Page 2: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

2

Protein Complexes and Functional Modules

Protein complex: Proteins interacting with each other at the same time and place [Spirin et al. 2004]

Functional module: Set of proteins involved in a common elementary biological function

Bind each other at different time and place

Multiple protein complexes [Chen et al. 2005]

Page 3: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

3

Identification of Functional Modules

Protein Interaction Networks (PINs) Functional modules correspond to highly connected sub-

graphs in a PIN Many graph clustering approaches

Clique-based methods: strict and not scalable to large PINs Density-based methods: issues with low-degree nodes and low

topological connectivity Hierarchical methods

Hierarchical organization of the modules within PINs Global metric: not scalable to large PINs Local metric: common misclassification of low-degree nodes Poor performance on noisy PINs; i.e., false positives interactions

Page 4: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

4

Graph Clustering

Find non-overlapping communities in PINs

Page 5: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

5

Hierarchical Methods -- Related Works

Divisive Approaches Iteratively remove an edge with the

Highest Edge Betweenness Score CNM method [Clauset et al 2004] O(m h logn)

Lowest Edge Clustering Coefficient Radicchi method [Radicchi et al 2004] O(m2)

These are global measures

)]1(),1min[(

)3(,)3(

,

vu

vuvu kk

ZC

Page 6: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

6

Hierarchical Methods -- Related Works

Agglomerative Approaches:

Iteratively merge two clusters Cu and Cv

Edge Clustering Value:

Local similarity metric between nodes

HC-PIN Algorithm [Wang et al 2011]

||||

||),(

2

vu

vu

NN

NNvuECV

Page 7: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

7

Our New Criterion – UnWeighted PINs

Relative Vertex-to-Vertex Clustering Value

0 ≤ R(u → v) ≤ 100 Likelihood of u to be in v’s cluster

Not how likely that both u and v lie in the same cluster Local similarity pre-metric Principle of preferential attachment in scale-free networks

u

vu

N

NNvuR 100)( }{aNN aa

Page 8: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

8

Our New Criterion – Weighted PINs

Where,

w(x, y) = weight on interaction edge (x, y)

EbuNb

EauIa

w

u

vu

buw

auwvuR

),(;

),(;

),(

),(100)( ,

vuvu NNI ,

EyxVyx yxwx),(;

),( : of degree weighted

Page 9: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

9

FAC-PIN Algorithm – Test for Inclusion

Insert u into Cv whenever

1. R(u → v) = 100

2. R(u → v) > R(v → u)

3. R(u → v) = R(v → u) and1. R(u → v) = R(v → u) = 100 or

2. R(u → v) > 50 That is whenever: R(u → v) > 50μ and R(u → v) ≥ R(v → u) Algorithm: for each v; iteratively insert its neighbors u

into Cv whenever test is true for u.

2

u

vu

NNN

Page 10: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

10

FAC-PIN Algorithm - Clustering

Initialization Phase Form singleton cluster C(v) for each v

Community Detection Phase For each v, include each neighbor u into C(v) whenever

[ Rw(u → v) > 50μ and Rw(u → v) ≥ Rw(v → u) ] is true

with merging parameter: 0 ≤ μ < 2

Partition Computation Phase Obtain the induced subgraph of G for each C(v) as sub-

network cluster

Evaluation Phase

Page 11: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

11

FAC-PIN Algorithm - Clustering

Page 12: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

12

Computational Complexities

Given n nodes and m edges

CNM Algorithm: O( m h logn ) h = height Radicchi Algorithm: O( m2 ) HC-PIN Algorithm: O( m δ2 ) FAC-PIN Algorithm: O( n δ2 ) << O( n D2 )

δ = average degree and D = maximum degree

Page 13: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

13

Computational Experiments

For any given PIN:

1. Apply FAC-PIN with merging parameters μ

2. Evaluate modularity of resulting partitions Pk,μ

Three modularity functions

3. Pk = best Pk,μ

4. Execution time to obtain Pk,μ

5. Functional Enrichment validations with SGD GO P-value cutoff = 0.05 Retain significant clusters and number of significant clusters

Page 14: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

14

Data Sets

8 un-weighted PIN data of from REACTOME database Including PIN data of S. cerevisiae (yeast SC-1) PIN data

5697 proteins 50675 interactions

1 un-weighted PIN and corresponding weighted PIN data of S. cerevisiae (yeast SC-2) from DIP database 4726 proteins 15166 interactions

Protein complexes from MIPS database

Page 15: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

15

Results – Effect of Merging Parameter μ(SC-2; 4726 proteins and 15166 interactions)

• Recall: merging test = [ Rw(u → v) > 50μ and Rw(u → v) ≥ Rw(v → u) ]

• Less neighbors are merged with v as μ increases, hence k increases with μ

Page 16: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

16

Results – Execution Times in Seconds(PINs from Reactome database; μ = 0.5)

Page 17: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

17

Results – Modularity Functions

Function Q:

Function Ω:

Function D:

where

w(u, v) = 0 or 1 for un-weighted PINs

k

iiiikw aePQ

1

2)(

k

iiiikw aeP

1

log)(

k

i i

iiiikw C

CCLCCLPD

1 ||

),(),()(

21 ,21 ),(),(SvSu

vuwSSL

),(

),(

VVL

CCLe ii

ii

),(

),(

VVL

VCLa i

i

Page 18: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

18

Results – Modularity of FAC-PIN Partitions(PINs from Reactome database; μ = 0.5)

Qw

Ωw

Dw

Page 19: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

19

Functional Module Prediction

Recall indicates how effectively proteins with the same functional category in the network are extracted

Precision illustrated how consistently proteins in

the same module are annotated

f-measure is used to evaluate the overall performance

Average f-measure as the accuracy of the algorithms

||

||

i

i

F

FCRE

||

||

C

FCPR i

PRRE

PRREFM

2

Page 20: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

20

Functional Enrichment of FAC-PIN Modules

Hypergeometric distribution… …

Page 21: The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.

21

Results – Functional Enrichment Validations(Un-weighted SC-1; 5697 proteins and 50675 interactions; μ = 0.5)