Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google...

49
Andrei Zinovyev institut Curie - INSERM U900 - PSL Research University / Mines ParisTech Computational Systems Biology of Cancer Reduced Google Matrix approach for exploring biological networks

Transcript of Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google...

Page 1: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Andrei Zinovyev

institut Curie - INSERM U900 - PSL Research University / Mines ParisTech

Computational Systems Biology of Cancer

Reduced Google Matrix

approach for exploring

biological networks

Page 2: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Biological networks

• Representation of cellular biochemistry at various level

granularity

RECON2.2 – close to complete

reconstruction of human

metabolic network (>10k reactions)

Page 3: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Mathematical modeling of large networks

of chemical reactions for biology?• Chemical kinetics formalism

• Lack of quantitative parameters

• Flux balance analysis – just stoichiometry

• Modularisation and abstraction

• Approaches based on “thermodynamic”

thinking – extracting macrovariables (e.g.,

formalism of invariant manifolds)

• Approaches based on assumptions on

parameter distribution : asymtotology of

chemical reaction networks (Gorban,

Radulescu, Zinovyev, Chem Eng Sci, 2009)

Cell metabolism

(including DNA

metabolism)

Signaling networks

(curation, e.g. SIGNOR

or ACSN)

Transcriptional

networks(in principle measurable)

Page 4: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Biological networks

• Representation of cellular biochemistry at various level

granularity

“Influence” networks“Interaction” networks

From Wikipedia

Page 5: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Atlas of Cancer Signaling Network: http://acsn.curie.fr (Kuperstein et al, Oncogenesis, 2015)

Google Maps API

Inna Kuperstein

Page 6: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Using ACSN map for visualization of

expression data

Patchy coloring

difficult to interpret

Smooth coloring

easily interpretable

Data

“Sm

oo

thin

g”

Page 7: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

• Network defines functional proximity between genes/proteins:

– in standard approach the functional distance is binary (in the same pathway or

not) or discrete (number of steps)

– similarity function on the network graph:

how easy to get from A to B without knowing the map

• «Guilty by association» principle

– Determining «active network regions»

– «Neighbourhood» gene/protein sets

• Network propagation

– Propagation of «influence»:

local and distant

– Undirected and directed networks:

analogy with heat diffusion or

random walks on graphs

Use of large biological networks

in biological data analysis

Page 8: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Initial perturbation Stationary state

Page 9: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Network propagation ends up in a smooth score

distribution function defined on the interaction graph

A B

CD

E

F

G

H

I

K

A B

CD

E

F

G

H

I

K

Smooth distribution Non-smooth distribution

High score

Low score

Page 10: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Network smooting or Spectral graph analysis

(Fourier transformation on graphs), (Rapaport, Zinovyev, Barillot, Vert, BMC Bioinformatics, 2007)

Function on graph

Slow, smooth componentFast, high frequency

component

= +

Franck Rapaport

Page 11: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Classifier smooth on biological graphRapaport et al., BMC Bioinformatics, 2007

200 Gy irradiation

0 h

7 h

3 h

No irradiation

5 h

0 h

3 h

“Classical” SVM SVM done in the reduced

subspace of smooth functions

(first 20% of Laplacian eigenvalues)

Data from

Marie Dutreix

(Institut Curie)

Page 12: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

DeDaL: Cytoscape plugin for constructing

data-driven network layouts http://bioinfo-out.curie.fr/projects/dedal/, Czerwinska et al, BMC Sys Biol, 2015

Tissue-specific gene

expression data +

Network smoothing +

Non-linear dimension

reduction (manifold learning)

Urszula Czerwinska

Page 13: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Network diffusion

vs random walk with restart

(Simple)

diffusion

Random walk

with restart

initial state

stationary

state

A B

CD

E

F

G

H

I

K

a

A B

CD

E

F

G

H

I

K

A B

CD

E

F

G

H

I

K

A B

CD

E

F

G

H

I

K

a

Page 14: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Random walk with restart (RWR): Google matrix

A B

C

D

E

F

G

H

I

K

Gij = aSij+(1-a)/N

G X = l X

PageRank =

“stationary”

eigenvector

corresponding

to l=1,

probability of

visiting the node

after infinite time

Page 15: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Analysing somatic mutations in cancerTCGA datasets

Page 16: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Problem of mutation data analysis

(predicting survival)

random

prediction

pa

tie

nts

genes

Page 17: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Problem of mutation data analysis

(clustering with NMF)

Page 18: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

The role of the total number of mutations

(mutational load)

Overlap between

mutations is very small

Tumors are very

different in mutational load

Page 19: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”
Page 20: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

NSQN method (Network Smoothing + Quantile

Normalization) (Hofree et al, 2013)

A B

CD

E

F

G

H

I

K

mutated

mutated

Page 21: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

NSQN method (Network Smoothing + Quantile

Normalization) (Hofree et al, 2013)

A B

CD

E

F

G

H

I

K

mutated

mutated

Page 22: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Testing NSQN for survival prediction in cancer(Le Morvan, Zinovyev, Vert, PLoS Comp Biol, 2017)

• TCGA data on 8 cancer types (LUAD, SKCM, GBM, BRCA, KIRC,

HNSC, LUSC, OV)

• Benefit from NSQN only for 2 cancer types (LUAD, SKCM)

• Quantile normalization is an essential step! (NS = NSQN without QN)

• Considering first neighbours (SimpNSQN=NSQN with k=1) is enough!

random

prediction

Marine Le Morvan

Page 23: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Instead of network propagation+QN ->

NetNorm equilibrating the number of mutations to k

Hubs mark mutated

“functions”

Orphan nodes

are most probably

passenger mutations

NetNorm

normalizes the mutation

matrix using

“guilty by association”

principle

k=4

proxy

Page 24: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

NetNorm is more performant than NSQN

(when both work)

Using only mutations Adding clinical info

random

prediction

Page 25: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Is the network structure really play role?

NetNorm benefits more from the real network structure than NSQN

random

prediction

Page 26: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Application of Google Matrix approach

to signalling network (SIGNOR)(Lager, Shepelyansky, Zinovyev, PLoS One, 2018)

3k nodes, 7k edges

Page 27: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Application of Google Matrix approach

to signalling network (SIGNOR)(Lager, Shepelyansky, Zinovyev, PLoS One, 2018)

Page 28: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Reduced Google matrix(refer to Frahm&Shepelyansky, arXiv, 2016)

A B

CD

E

F

G

H

I

K

direct

indirect “hidden”d

ire

ct lin

ks

ind

ire

ct

“hid

de

n”

(>0

.01

)

Dima

ShepelyanskyJosé Lages

Page 29: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Type of data: cell-specific

transcriptional regulation network

direct

indirect “hidden”

Signaling network (SN)

(SIGNOR database)

Transcription regulation

network (TRN), reconstructed

from systematic Chip-Seq

experiments

Reduced Google matrix analytically

quantifies the global effect of

transcriptional feedback

Page 30: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Comparing two TRN networks :

e.g., “normal” vs “cancer”

direct

indirect “hidden”

TRN1 (“normal”) TRN2 (“cancer”)

B

EG

Normal B-Lymphocytes Leukemia cell line

indirect signaling rewiring

Page 31: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Comparing two TRN networks :

e.g., “normal” vs “cancer” (Lager, Shepelyansky, Zinovyev, PLoS One, 2018)

direct

indirect “hidden”

TRN1 (“normal”) TRN2 (“cancer”)

B

EG

PageRank

goes down

PageRank

improves

Normal B-Lymphocytes Leukemia cell line

indirect signaling rewiring

Page 32: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Change of PageRank and CheiRank in cancer

CheiRank changes were 3 times larger and more

biologically interpretable (touch the genes associated with leukemia)

Page 33: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Genes of a proliferative signature

resulted from pancancer transcriptomic analysis

Page 34: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Genes of a proliferative signature

resulted from pancancer transcriptomic analysis

More genes are connected into the network

Emergence of a new “hidden” hub BUB1

Connection to PCNA (DNA replication and DNA repair)

Many cell cycle proteins improves in PageRank (AURK)

Connection between STIL (mitotic spindle checkpoint regulator) and CCNA2, CCNE1

Page 35: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

WikiProteins project: studying the protein

network embedded in Wikipedia

A sample of Wikipedia

network of pages at

three steps from

“Transhumanism” article

(5k nodes, 23k links)

What is transhumanism?http://allthingsgraphed.com/2015/09/16/what-is-

transhumanism-wikipedia/

Page 36: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

“Semantic field” of WikiPedia

Direct links

Page 37: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

“Semantic field” of WikiPedia

Page 38: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Inferring directed network of proteins from

Wikipedia using reduced Google Matrix(ongoing work…)

Page 39: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Inferring directed network of proteins from

Wikipedia using reduced Google Matrix(ongoing work…)

~10000 wiki pages devoted to proteins

5000 proteins with described interactions, 16000 (2013) and

18000 (2017) direct connections

Wiki proteins hairball

The rest of the Wikipedia

defines a context of hyperlinks

(model of external world)

This network is embedded

in the global Wikipedia network

Reduced Google matrix

allows finding “hidden”

functional interactions between

proteins

Page 40: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Comparing protein network of direct links and network

of hidden links (same density) in Wikipedia (2013)

Direct links Hidden links

Page 41: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Comparing protein network of direct links and network

of hidden links (same density) in Wikipedia (2013)

Direct links

connectivity distribution

Hidden links

connectivity distribution

Page 42: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

“Hidden” protein communities (2013)

Clustering the hidden network with MCL algorithm

Immune

system

Cell

CycleGlucagon

metabolism GTPase

singaling

Potassium

Ion transport

Transcription

factors

Apoptosis

and inflammationCoagulation

Keratins

Peroxisome

Nuclear

receptors

Hormone

receptors

Page 43: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

“Hidden” protein communities (2017)

Immune

system

Cell

Cycle

(G2M)

? Apoptosis

Potassium

Ion transport

GTPase

singaling

Coagulation

SH2/3

signaling

DNA

repairNFkB

Cell-cell

junctions

Page 44: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Dynamics of the largest hidden

communities from 2013 to 2017 (20 largest)

20

13

co

mm

un

itie

s 20

17

co

mm

un

ities

Page 45: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Example: Coagulation-related community

Highest local PageRank : ThrombinThrombin

Antithrombin

Factor XII

Factor VIII

Transthyretin

Factor VII

Factor IX

Protein S

Protein C

Factor X

Factor V

P-selectin glycoprotein ligand-1

ADAMTS13

Tissue factor

Osteocalcin

Heparin cofactor II

Gamma-glutamyl carboxylase

Annexin A5

Apolipoprotein H

ITGA2B

GAS6

Fibrinogen alpha chain

Fibrinogen beta chain

Matrix gla protein

Carboxypeptidase B2

ITIH2

FGL2

ADAM22

Liver

Fresh

frozen

plasma

Page 46: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Fimbrin

SOS1

Epigen

CDC42

RAC1

RHOA

PAK1

Rac3

Kalirin

RAC2

ARHGEF7

WNK1

RhoG

Rnd1

ANLN

RALB

Dock2

Synergin gamma

EXOC7

Rnd3

RhoD

Rnd2

RhoH

RAPGEF2

Dock7

Dock4

RCC2

FMNL2

Dock3

SRGAP2

Page 47: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Birth of new super-large hidden

community in 2017Myoglobin

Hidden network

Direct interactions explaining

hidden connections

Page 48: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Conclusions

References1. Rapaport et al, BMC Bioinformatics 8:35. 2007.

2. Czerwinska et al, BMC Systems Biology 14;9:46, 2015.

3. Lages et al, PLoS One, 13(1):e0190812

4. Le Morvan et al. PLoS Comp Biol 13(6):e1005573. 2017.

https://github.com/marineLM/NetNorM

http://bioinfo-out.curie.fr/projects/dedal

Network propagation is a powerful tool in joint analysis of

molecular biology (medical) data and biological networks

First neighbourhood relations seem to be sufficient in practical

applications

Google Matrix approach highlights creative elements and detect

indirect rewiring events

Wikiprotein hidden communities gives idea about dynamical

hotspots of interest in molecular biology

Page 49: Reduced Google Matrix approach for exploring biological networks · 2018-10-23 · Reduced Google Matrix approach for exploring biological networks. ... • Approaches based on “thermodynamic”

Acknowledgements

Mutation data

analysis

Data-driven

network layouts

ACSN

Network smoothing

Funding from Agilent Thought Leader Award

CNRS ApliGoogle project

ACI IMPBIO KernelChip

Jean-Philippe Vert

Ecole de Mines

Emmanuel Batillot

Institut CurieFranck Rapaport

Memorial Sloan Kettering

Laurence Calzone Urszula Czerwinska

Institut Curie

Marine Le Morvan

Ecole de Mines

Inna Kuperstein

Institut Curie

Reduced Google matrix

Dima

Shepelyansky

Université Paul

Sabatier

José Lages

Institut UTINAM

Klaus Frahm

Université Paul

Sabatier