The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of...

Post on 24-Dec-2015

223 views 0 download

Transcript of The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of...

1

The Relative Vertex-to-Vertex Clustering Value

A New Criterion for the Fast Detection of Functional Modules in Protein Interaction

Networks

Zina Mohamed Ibrahim

(King’s College, London, UK)

Alioune Ngom

(University of Windsor, Windsor, Canada)

2

Protein Complexes and Functional Modules

Protein complex: Proteins interacting with each other at the same time and place [Spirin et al. 2004]

Functional module: Set of proteins involved in a common elementary biological function

Bind each other at different time and place

Multiple protein complexes [Chen et al. 2005]

3

Identification of Functional Modules

Protein Interaction Networks (PINs) Functional modules correspond to highly connected sub-

graphs in a PIN Many graph clustering approaches

Clique-based methods: strict and not scalable to large PINs Density-based methods: issues with low-degree nodes and low

topological connectivity Hierarchical methods

Hierarchical organization of the modules within PINs Global metric: not scalable to large PINs Local metric: common misclassification of low-degree nodes Poor performance on noisy PINs; i.e., false positives interactions

4

Graph Clustering

Find non-overlapping communities in PINs

5

Hierarchical Methods -- Related Works

Divisive Approaches Iteratively remove an edge with the

Highest Edge Betweenness Score CNM method [Clauset et al 2004] O(m h logn)

Lowest Edge Clustering Coefficient Radicchi method [Radicchi et al 2004] O(m2)

These are global measures

)]1(),1min[(

)3(,)3(

,

vu

vuvu kk

ZC

6

Hierarchical Methods -- Related Works

Agglomerative Approaches:

Iteratively merge two clusters Cu and Cv

Edge Clustering Value:

Local similarity metric between nodes

HC-PIN Algorithm [Wang et al 2011]

||||

||),(

2

vu

vu

NN

NNvuECV

7

Our New Criterion – UnWeighted PINs

Relative Vertex-to-Vertex Clustering Value

0 ≤ R(u → v) ≤ 100 Likelihood of u to be in v’s cluster

Not how likely that both u and v lie in the same cluster Local similarity pre-metric Principle of preferential attachment in scale-free networks

u

vu

N

NNvuR 100)( }{aNN aa

8

Our New Criterion – Weighted PINs

Where,

w(x, y) = weight on interaction edge (x, y)

EbuNb

EauIa

w

u

vu

buw

auwvuR

),(;

),(;

),(

),(100)( ,

vuvu NNI ,

EyxVyx yxwx),(;

),( : of degree weighted

9

FAC-PIN Algorithm – Test for Inclusion

Insert u into Cv whenever

1. R(u → v) = 100

2. R(u → v) > R(v → u)

3. R(u → v) = R(v → u) and1. R(u → v) = R(v → u) = 100 or

2. R(u → v) > 50 That is whenever: R(u → v) > 50μ and R(u → v) ≥ R(v → u) Algorithm: for each v; iteratively insert its neighbors u

into Cv whenever test is true for u.

2

u

vu

NNN

10

FAC-PIN Algorithm - Clustering

Initialization Phase Form singleton cluster C(v) for each v

Community Detection Phase For each v, include each neighbor u into C(v) whenever

[ Rw(u → v) > 50μ and Rw(u → v) ≥ Rw(v → u) ] is true

with merging parameter: 0 ≤ μ < 2

Partition Computation Phase Obtain the induced subgraph of G for each C(v) as sub-

network cluster

Evaluation Phase

11

FAC-PIN Algorithm - Clustering

12

Computational Complexities

Given n nodes and m edges

CNM Algorithm: O( m h logn ) h = height Radicchi Algorithm: O( m2 ) HC-PIN Algorithm: O( m δ2 ) FAC-PIN Algorithm: O( n δ2 ) << O( n D2 )

δ = average degree and D = maximum degree

13

Computational Experiments

For any given PIN:

1. Apply FAC-PIN with merging parameters μ

2. Evaluate modularity of resulting partitions Pk,μ

Three modularity functions

3. Pk = best Pk,μ

4. Execution time to obtain Pk,μ

5. Functional Enrichment validations with SGD GO P-value cutoff = 0.05 Retain significant clusters and number of significant clusters

14

Data Sets

8 un-weighted PIN data of from REACTOME database Including PIN data of S. cerevisiae (yeast SC-1) PIN data

5697 proteins 50675 interactions

1 un-weighted PIN and corresponding weighted PIN data of S. cerevisiae (yeast SC-2) from DIP database 4726 proteins 15166 interactions

Protein complexes from MIPS database

15

Results – Effect of Merging Parameter μ(SC-2; 4726 proteins and 15166 interactions)

• Recall: merging test = [ Rw(u → v) > 50μ and Rw(u → v) ≥ Rw(v → u) ]

• Less neighbors are merged with v as μ increases, hence k increases with μ

16

Results – Execution Times in Seconds(PINs from Reactome database; μ = 0.5)

17

Results – Modularity Functions

Function Q:

Function Ω:

Function D:

where

w(u, v) = 0 or 1 for un-weighted PINs

k

iiiikw aePQ

1

2)(

k

iiiikw aeP

1

log)(

k

i i

iiiikw C

CCLCCLPD

1 ||

),(),()(

21 ,21 ),(),(SvSu

vuwSSL

),(

),(

VVL

CCLe ii

ii

),(

),(

VVL

VCLa i

i

18

Results – Modularity of FAC-PIN Partitions(PINs from Reactome database; μ = 0.5)

Qw

Ωw

Dw

19

Functional Module Prediction

Recall indicates how effectively proteins with the same functional category in the network are extracted

Precision illustrated how consistently proteins in

the same module are annotated

f-measure is used to evaluate the overall performance

Average f-measure as the accuracy of the algorithms

||

||

i

i

F

FCRE

||

||

C

FCPR i

PRRE

PRREFM

2

20

Functional Enrichment of FAC-PIN Modules

Hypergeometric distribution… …

21

Results – Functional Enrichment Validations(Un-weighted SC-1; 5697 proteins and 50675 interactions; μ = 0.5)