Computational Systems Biology Lectures 6 and 7: Reconstruction and structural analysis ... ·...

Computational Systems Biology

1


Lectures 6 and 7: Reconstruction and structural analysis of

biological networks

Hongwu MaComputational Systems Biology Group

10 February 2006


2

Simple models

1210321 XSSX EEE ⎯→⎯⎯→⎯⎯→⎯

An enzyme reaction Enzyme kinetics

A linear pathway

BA E⎯→⎯

Metabolic control analysis


3

The reality

Glycolysis pathway The whole network


4

Systems Biology comes of age

0

100

200

300

400

1997 1999 2001 2003 2005

SB institute in seattle

1st ICSB in Japan

Review paper by L hood

Special issues in Nature and Science

1996, Systems biology unit, Glaxo WellcomeMedicines Research Centre

No. of Pubm

edpapers


5

The background

• Fully sequenced genomes for several hundreds organism

• High-throughput methods for measurement at whole system level (more data with less experiments)

• Various internet available databases make it easy to collect and integrate information at system level.


6

But how to study SB

• Methods for constructing the system (network) from available information

• Methods for theoretical analysis of such complex systems

• How “system analysis” contribute to biology


7

Reconstruction of metabolic networks

• Organism specific MN from its genome• High throughput reconstruction from

databases• High quality network: Human curation


8

Metabolic network reconstruction

Genes

b0001b0002b0003……..

Enzymes

EC 1.1.1.3EC 1.1.1.6EC 1.1.1.17

………...

Reactions

D-Glucose + ATP = D-G6P + ADPD-G6P = D-F6P

………………………


9

Reconstruction from genome

Database

Gene annotation

Network

CTGAGGTCGACTCTAGAGGATCCCCCTTCCAGATGTGTAAGTTA

TTTGAGTGAAATAGTGCCAGTTTAACCATAGTTCTAGTAAGCT

TTTGAGTGAAATAGTGCCAGTTTAACCATAGTTCTAGTAAGCTG

Genome sequence

ORF

Gene recognition

Sequence similarity search, Blast

ID Name start end functionEG11277 thrL 190 255 Regulatory leader peptide for thrABC operoEG10998 thrA 337 2799 Aspartokinase I-homoserine dehydrogenasEG10999 thrB 2801 3733 Homoserine kinase [EC 2.7.1.39]EG11000 thrC 3734 5020 Threonine synthase [EC 4.2.3.1]


10

High throughput reconstruction from enzyme databases

http://www.genome.ad.jp/dbget-bin/www_bget?enzyme+1.1.1.81ENTRY EC 1.1.1.81 NAME Hydroxypyruvate reductase CLASS Oxidoreductases Acting on the CH-OH group of donors With NAD+ or NADP+ as acceptor SYSNAME D-Glycerate:NADP+ 2-oxidoreductase REACTION D-Glycerate + NAD+ or NADP+ = Hydroxypyruvate + NADH or NADPH PATHWAY PATH: MAP00260 Glycine, serine and threonine metabolism PATH: MAP00630 Glyoxylate and dicarboxylate metabolism GENES YPE: YPO2536 PAE: PA1499 RSO: RS03094(ttuD1) RS05749(ttuD2) MLO: mlr5146 SME: SMa1406(ttuD3) SMb20678(ttuD2) SMc04389(ttuD1) ATU: Atu3232 Atu5334


11

Useful databases• KEGG: http://www.genome.ad.jp/kegg/• Biocyc: http://biocyc.org/• Expasy: http://www.expasy.ch• Brenda: http://www.brenda.uni-koeln.de/

Ability required: writing program to extract information from websites or downloadable text files.


12

From Enzyme to reaction

For EC 5.3.1.9:

4 reactions in KEGG LIGAND database

2 reactions in WIT database

1 reaction in IUBMB and Expasy databases

1 reaction in BRENDA database

1 reaction in EcoCyc

An enzyme may have wide substrate catalysis activity. E.g. 1.1.1.2: an alcohol + NADP+ = an aldehyde + NADPH + H+


13

KEGG LIGAND reaction databaseMore than 6000 reactions

………

ENTRY R02739

NAME alpha-D-Glucose 6-phosphate ketol-isomerase

DEFINITION alpha-D-Glucose 6-phosphate <=> beta-D-Glucose 6-phosphate

EQUATION C00668 <=> C01172

ENZYME 5.3.1.9 5.1.3.15

………

irreversibility


14

E. Coli Metabolic network from KEGG

Reaction

Metabolite

638 Enzymes1081 reactions

Ma & Zeng (2003) Bioinformatics. 19, 270


15

Problems in the high throughput methods

•Nonenzyme catalyzed reactions link

•Uncharacted enzymes like 1.-.-.-

•Unsequensed enzymes (gene for a known enzyme not identified, more than 40 in E. coli) link

•Wrongly annotated enzyme genes (the annotations in KEGG, Ecocyc and Expasy for E. coli are very different)


16

Enzyme databases

Annotated genome

Enzymes

EC 1.1.1.3EC 1.1.1.6EC 1.1.1.17

………...

Reactions

R0004R0007R0100

………...Metabolic network

High quality network reconstruction

KEGGExpasyBrenda

Human curation


17

Available high quality networks

E. coli: Ecocyc, Palsson’s group

Yeast: Palsson and Nielsen’s group

Helicobacter pylori

Haemophilus influenzae

Human?

link


18

Structural analysis of metabolic networks

• Graph representation• Structure properties• Network decomposition• Software for network analysis


19

Mathematical representation of MN

⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜

⎝

⎛

0010000110110100110100010

132313

6

PGPTPT

FDPPF

Glyceraldehyde 3-P

Fructose 1,6-P

Fructose 6-PH2O, Pi

1,3-P Glycerate

NAD, Pi

NADH

ATP

ADP

Glycerone P

r1 r2

r3

r4r5

T3P2

F6P

FDP

T3P1

⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜

⎝

⎛

−−−−

−−

−−

0111001010000000001100000000011101100000001100001100011

5

4

3

2

1

rrrrr

13PG

ATP

ADP

NAD

H

NAD

Pi

H

2O

Stoichiometric matrix Connectivity matrix

⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜

⎝

⎛

0110010100110100000100110

5

4

3

2

1

rrrrr


20

Connectivity matrix to graph

r1 r5

r4

r2

r3

T3P2

F6P

13PG

FDP

T3P1

Metabolite graph

Reaction graph

⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜

⎝

⎛

0010000110110100110100010

132313

6

PGPTPT

FDPPF

Connectivity matrix

⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜

⎝

⎛

0110010100110100000100110

5

4

3

2

1

rrrrr


21

Currency metabolites in graph analysis

Otherwise path length will be 2 instead of 9 (Jeong et al. 2000 Nature 407:651)

G l u c o s e

G l u c o s e 6 - P

G l y c e r a l d e h y d e 3 - P

F r u c t o s e 1 , 6 - P

F r u c t o s e 6 - P

2 - P G l y c e r a t e

P h o s p h o e n o l p y r u v a t e

A T P

A D P

H 2 O , P i

3 - P G l y c e r a t e

1 , 3 - P G l y c e r a t e

H 2 O

N A D , P i

N A D H

P y r u v a t e

A D P

A T P

A T PA D P

G l y c e r o n e P

A D PA T P

From glucose to pyruvate, ADP can not be used as a link.

Currency metabolites


22

With or without currency metabolites

Metabolic network of S. pneumonia (616 reactions)


23

Structural properties of large scale network

• Connection degree distribution• Path length (efficiency)• Node centrality (the most important nodes)• Connectivity of the network• Modules in the network• Network motifs


24

Connection degree distribution

0. 001

0. 01

0. 1

1

1 10 100K

P(K)

out puti nput

γ−= akkP )(

P(k): Percentage of nodes with a connection degree k or not less than k (Cumulative distribution).

Power law degree distribution indicates a scale free network

The few highly connected nodes are called hubs in the network


25

Hub metabolites

Glycerate-3-phosphate, D-Ribose-5-phosphate, Acetyl-CoA, Pyruvate, D-Xylulose 5-phosphateD-Fructose 6-phosphate, 5-Phospho-D-ribose 1-diphosphate, L-Glutamate, D-Glyceraldehyde 3-phosphate, L-Aspartate, Propanoyl-CoA, Malonyl-ACP, Succinate, Acetate, Isocitrate, Fumarate


26

Average path lengthPath length: number of the steps in the shortest paths from one node to anotherAPL: average of the path lengths for all connected pairs of nodes

Breadth first searching method to find the shortest paths from one node to all other nodes


27

Average path length in MNJeong et al. (2000) Nature: constant AL (about 3) for all the 43 organisms studied.

02468

1012

0 200 400 600 800 1000

Node number

Aver

age

path

leng

th

EukaryoteBacteriaArchaea

Structural differences among MNs of the three domains of organism

Ma & Zeng (2003) Bioinformatics. 19, 270


28

Node Centrality

dyxdnxC

xyUy

1),(

1)( ,

=−

=∑ ≠∈

Closeness centrality of node x:

d(x,y) the path length between node x and node yU the set of all nodes

average path length between x and the other nodesd

The central nodes have short path lengths to other nodes in the network


29

The most central metabolites in the metabolic network of E. coli

5.06G3P

5.03Acetaldehyde

4.98Acetate

4.932KD6PG

4.89Malate

4.76Actyl-CoA

4.44Pyruvate

Mean distanceMetabolite

nodes connectedhighly nodes centralmost ≠


30

Network Centrality

)2)(1())(()32( *

−−

−−= ∑ ∈

nnxCCn

C Ux

n node number in the networkC(x) the closeness centrality for node xC* the highest value of node closeness centrality

circle network star network

C = 0 C = 1

Network closeness centralization index

Distribution of the node centrality in the network


31

Average path length vs. network closeness centralization index

0

2

4

6

8

10

12

0 0.05 0.1 0.15 0.2 0.25

OCCI

ALG

EukaryoteBacteriaArchaea

C


32

Network Connectivity

Degree distribution tell nothing about global connectivity

The right network can have short average path length though not connected at all


33

Strongly or weakly connected components

Fully connected subsets depending on if direction is considered


34

SC in metabolic network

72

15

5 3 1 2 10

10

20

30

40

50

60

70

80

2 3 4 5 6 7 8 9 10 11 274

Size of components

Num

ber

of c

ompo

nent

s

Num

ber

GSC: Giant strong component

One big SC and many small SCs


35

Connectivity structure of MN

Giant strong component (GSC)metabolites fully converted and

convertible to each other

Substrate subset (93)converted to metabolite in GSC

Product subset (161)produced from metabolites in GSC

274 out of total 811 metabolites

Isolated subset (283)


36

Bow-tie: a general structure of biological and physical networks

Giant Strong Component

disconnected components

OUTsubset

INsubset

• Metabolic network• Signal transduction• Web page• Material processing and other tech. systems


37

The average path length of the GSC determines that of the whole network

0

4

8

12

0 2 4 6 8 10 12

ALW

ALG

Path length of whole network

Path

leng

th o

f GSC


38

Other structure parameters

• Cluster coefficient• Betweeness centrality• Reachability• Clique and core


39

Pajek, the right software• Written in C, very fast, many functions, 1M• Free and updated with new function frequently• good manual, theory introduction with how-to-do in

the software• Book available: Exploratory Social Network

Analysis with Pajek• http://vlado.fmf.uni-lj.si/pub/networks/pajek/ h


40

Other tools

• Cytoscape http://www.cytoscape.org/• Ucinet

http://www.analytictech.com/ucinet.htm• Bioconductor and R• Matlab


41

Network decomposition

• Identifying a somehow independent subnetwork (modules) from such complex networks for biological function analysis or developing more detailed kinetic model


42

WHYUniversity of Edinburgh

CSE CMVM CHSS

S1 S2 S3 S1 S2 S3 S1 S2 S3


43

A Method• Define a distance between two nodes: the shortest path

between node A and node B• Distance matrix a hierarchical classification tree


44

Hierarchical decomposition of GSC

6

4

3

2

1

5

1

4

3

2

5

5


45

Biological function of these modules

1. Aspartate metabolism, Thr, Lys synthesis2. Glutathione and THF metabolism 3. Glutamate metabolism, Arg, Pro synthesis4. galacterate, glycerate metabolism, Ser synthesis5. PP pathway, upper part of glycolysis pathway,

sugar, aminosugar and glycerol metabolism6. TCA cycle and glyoxylate cycle, pyruvate

metabolism, AcCoA, AcAcCoA metabolism

Modules identified by the decomposition method have true biological meaning


46

Transcriptional regulation: biological basis

• Transcription unit: operon• Sigma factor• Transcription factor (TF)• Promotor• TF binding site• More complex in Eukaryotes

Transcription unit

RNA polymerase

Sigma factor

Promotor

TFs

Binding site Genes

Interactions between proteins and genes


47

Reconstruction of MN and TRN

E. coli

P. aeruginosa

A1 B1

A2 B2

1.2.1.3

mA

mB

TF

+

+?

Are the orthologous genes in different organisms regulated in similar ways?


48

Transcriptional regulatory networks

• Can not be reconstructed directly from gene annotation information

• data is mainly from database which collect information from literatures (for E. coli and B. subtilis)

• ChIP on chip technology for high throughput reconstruction (mainly yeast network)


49

Available databases

• RegulonDB(http://www.cifn.unam.mx/Computational_Genomics/regulondb/)

• Ecocyc (ecocyc.org)• DBTBS (B. subtilis) (dbtbs.hgc.jp)• Prodoric Net (Prokaryote) (prodoric.tu-bs.de)• TRANSFAC (Eukaryotes)

(www.biobase.de/pages/products/transfac.html)


50

E. coli regulatory network1278 genes 2724 interactions157 genes for TFs382 metabolic enzyme genes (Blue)

Ma et al. (2004) Nucleic Acid Research 32, 6643


51

Connectivity analysisA giant weakly connected component with 1166 nodes

No strongly connected component exists, indicating no multiple gene regulatory loop exist, impling a somehow acyclic structure.


52

Method to find regulatory hierarchy


53

The nine-layer hierarchical structure

Average path length: 1.85

rpoE

fis

soxR

cspA

rpoS

phoB

crp

ihf

argP

cspE

rpoNdnaA

hnscytR

Like the organization of the university?


54

Network motifSubgraphs that appear in the network at frequencies much higher than those found in randomized networks (Shen-Oorret al., Nature Genetics, 31:64).

Regarded as elementary building blocks of network

Milo et al. Network motifs: simple building blocks of complex networks.Science. 2002, 298(5594):824-7.


55

Network motifs in regulatory network

Feed forward loops

712 in the network

Bi-fan motif

11935


56

Motifs in the network

Blue: FFLs

Yellow: Bi-fan motifs

Green: Both


57

Two kinds of FFLs

+

+

++

−

+

Coherent FFLs Incoherent FFL

Direct effect of the upper regulator A on the target gene C is consistent with its indirect effect through B.

Direct effect is inconsistent with the indirect effect.

C

BB

AA

C

Mangan and Alon, (2003) PNAS, 100:11980


58

Motif distributionCoherent FFLs: 330

+

−

++

−

−−

−

−−

+−

+

++

+

+−

−

++type

number 265 28 7 30 119 9 8 16

examples

flhC-fliA-

fliGH

cpxR-csgD-csgA

fis-hns-cysG

fnr-narL-dcuB

crp-nagC-manX

YZ

arcA-betI-

betAB

ihf-flhD-nrfA

fnr-narL-

moeAB

−

−

+

Incoherent FFLs: 152


59

Motifs often interact with each other

hns

gadW

gadA

gadX

crprpoS

gadA: glutamate decarboxylase (in aminobutarate pathway, a by path of TCA cycle), important in oxidative stress and acid stress response


60

references

H.W. Ma and A.P. Zeng. (2003) Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics. 19(2)270-277. H.W. Ma and A.P. Zeng. (2003) The connectivity structure, giant strong component and centrality of metabolic networks. Bioinformatics. 19(11)1423-1430. H.W. Ma, et al (2004) An extended transcriptional regulatory network of Escherichia coli and its structural analysis, Nucleic Acid Research, 32:6643Barabási and Oltvai (2004) Network Biology: Understandingthe cell‘s functional organization. Nature Rev. Genet. 5, 101-114.

Computational Systems Biology Lectures 6 and 7: Reconstruction and structural analysis ... ·...

Documents

Transcript of Computational Systems Biology Lectures 6 and 7: Reconstruction and structural analysis ... ·...