Pattern storage in gene-protein networks

51
1 Pattern storage in gene-protein networks Pattern storage in gene-protein networks Pattern storage in gene-protein networks Ronald Westra Department of Mathematics Maastricht University

description

Ronald Westra Department of Mathematics Maastricht University. Pattern storage in gene-protein networks. Items in this Presentation. 1. Problem formulation 2. Modeling of gene/proteins interactions 3. Information Processing in Gene-Protein Networks - PowerPoint PPT Presentation

Transcript of Pattern storage in gene-protein networks

Page 1: Pattern storage in  gene-protein networks

1

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Pattern storage in

gene-protein networks

Ronald Westra

Department of Mathematics

Maastricht University

Page 2: Pattern storage in  gene-protein networks

2

Pattern storage in gene-protein networksPattern storage in gene-protein networks

1. Problem formulation

2. Modeling of gene/proteins interactions

3. Information Processing in Gene-Protein Networks

4. Information Storage in Gene-Protein Networks

5. Conclusions

Items in this Presentation

Page 3: Pattern storage in  gene-protein networks

3

Pattern storage in gene-protein networksPattern storage in gene-protein networks

1. Problem formulation

How much genome is required for an organism to survive in this World?

Some observations ...

Page 4: Pattern storage in  gene-protein networks

4

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Mycoplasma genitalium500 nm580 Kbp477 genes74% coding DNAObligatory parasitic endosymbiont

Nanoarchaeum equitans400 nm460 Kbp487 ORFs95% coding DNAObligatory parasitic endosymbiont

SARS CoV100 nm30 Kbp5 ORFs98% coding DNARetro virus

Minimal genome sizes

Page 5: Pattern storage in  gene-protein networks

5

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Organisms like Mycoplasma genitalium, Nanoarchaeum equitans, and the SARS Corona Virus are able to exhibit a large amount of complex and well-tuned behavioral patterns despite an extremely small genome

A pattern of behaviour here is the adequate conditional sequence of responses of the gene-protein interaction network to an external input: light, oxygen-stress, pH, feromones, and numerous organic and anorganic molecules.

Page 6: Pattern storage in  gene-protein networks

6

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Questions:

* How do gene-protein networks perform computations and how do they process real time information?

* How is information stored in gene-protein networks?

* How do processing speed , computation power,

and storage capacity relate to network properties?

Problem formulation

Page 7: Pattern storage in  gene-protein networks

7

Pattern storage in gene-protein networksPattern storage in gene-protein networks

CENTRAL THOUGHT [1]

What is the capacity of a gene-protein network to store input-output patterns, where the stimulus is the input and the behaviour is the output.

How does the pattern storage capacity of an organism relate to the size of its genome n, and the number of external stimuli m?

Page 8: Pattern storage in  gene-protein networks

8

Pattern storage in gene-protein networksPattern storage in gene-protein networks

CENTRAL THOUGHT [2]

Conjecture:

The task of reverse engineering a gene regulatory network from a time series of m observations, is actually identical to the task of storing m patterns in that network.

In the first case an engineer tries to design a network that fits the observations; in the second case Nature selects those networks/organisms that best perform the input-output mapping.

Page 9: Pattern storage in  gene-protein networks

9

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Requirements

For studying the pattern storage capacity of a gene-protein interaction system we need:

1. a suitable parametrized formal model

2. a method for fixing the model parameters with the given set of input-parameters

We will visit these items in the following slides ...

Page 10: Pattern storage in  gene-protein networks

10

Pattern storage in gene-protein networksPattern storage in gene-protein networks

2. Modeling the Interactions between Genes and Proteins

Prerequisite for the successful reconstruction of gene-protein networks is the way in which the dynamics of their interactions is modeled.

Page 11: Pattern storage in  gene-protein networks

11

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Components in Gene-Protein networks

Genes: ON/OFF-switches

RNA&Proteins: vectors of information exchange between genes

External inputs: interact with higher-order proteins

Page 12: Pattern storage in  gene-protein networks

12

Pattern storage in gene-protein networksPattern storage in gene-protein networks

General state space dynamics

The evolution of the n-dimensional state space vector x (gene expressions) depend on p-dim inputs u, parameters θ and Gaussian white noise ξ.

Page 13: Pattern storage in  gene-protein networks

13

Pattern storage in gene-protein networksPattern storage in gene-protein networks

external inputs

genes/proteins

input-coupling

interaction-coupling

Example of an general dynamics network topology

Page 14: Pattern storage in  gene-protein networks

14

Pattern storage in gene-protein networksPattern storage in gene-protein networks

The general case is too complex

Strongly dependent on unknown microscopic details

Relevant parameters are unidentified and thus unknown

Therefore approximate interaction potentials and qualitative methods seem appropriate

Page 15: Pattern storage in  gene-protein networks

15

Pattern storage in gene-protein networksPattern storage in gene-protein networks

1. Linear stochastic state-space models

Following Yeung et al. 2003 and others

x : the vector (x1, x2,..., xn) where xi is the

relative gene expression of gene ‘í’u : the vector (u1, u2,..., up) where ui is the

value of external input ‘í’ (e.g. a toxic agent)νξ(t) : white Gaussian noise

)(tvBA ξuxx

Page 16: Pattern storage in  gene-protein networks

16

Pattern storage in gene-protein networksPattern storage in gene-protein networks

2. Piecewise Linear Models

Following Mestl, Plahte, Omhold 1995 and others

bil sum of step-functions s+,–

Page 17: Pattern storage in  gene-protein networks

17

Pattern storage in gene-protein networksPattern storage in gene-protein networks

3. More complex non-linear interaction models

Example: including quadratic terms;

uaxxxx

BRAdt

d T:

)()()1( xx iiiii aa

dt

da

k kk )()( //T/ wxx

Page 18: Pattern storage in  gene-protein networks

18

Pattern storage in gene-protein networksPattern storage in gene-protein networksOur mathematical framework for

non-linear gene-protein interactions

uaxxxx

BRAdt

d T:

)()()1( xx iiiii aa

dt

da

k kk )()( //T/ wxx

Page 19: Pattern storage in  gene-protein networks

19

Pattern storage in gene-protein networksPattern storage in gene-protein networks

3. Information processing in sparseHierarchic gene-protein networks

Consider a network as described before with only a few connections (=sparse) and where few genes/proteins control the a considerable amount of the others (=hierarchic)

Page 20: Pattern storage in  gene-protein networks

20

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Information Processing in random sparse Gene-Protein Interactions

random sparse network, n=64, k=2 largest cluster therein

Page 21: Pattern storage in  gene-protein networks

21

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Information Processing in random sparse Gene-Protein Interactions

Now consider the information processing time (= #iterations) necesary to reach all nodes (proteins)

as a function of:

The number of connections (= #non-zero-elements) in the network

Page 22: Pattern storage in  gene-protein networks

22

Pattern storage in gene-protein networksPattern storage in gene-protein networks

phase transition from slow to fast processing

Page 23: Pattern storage in  gene-protein networks

23

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Page 24: Pattern storage in  gene-protein networks

24

Pattern storage in gene-protein networksPattern storage in gene-protein networks

* Ben-Hur, Siegelmann: Computation with Gene Networks, Chaos, January 2004

* Skarda and Freeman: How brains make chaos in order to make sense of the world,

Behavioral and brain sciences, Vol. 10 1987

Philosophy: Information is stored in the network topology (weights, sparsity, hierarchy) and the system dynamics

4. Memory storage in gene-protein networks

Page 25: Pattern storage in  gene-protein networks

25

Pattern storage in gene-protein networksPattern storage in gene-protein networks

We assume a hierarchic, non-symmetric, and sparse gene/protein network (with k out of n possible connections/node) with linear state space dynamics

Suppose we want to store M patterns in the network

Memory storage in gene-protein networks

Page 26: Pattern storage in  gene-protein networks

26

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Linearized form of a subsystem

First order linear approximation of system separates state vector x and inputs u.

uxx

BAdt

d

Page 27: Pattern storage in  gene-protein networks

27

Pattern storage in gene-protein networksPattern storage in gene-protein networks

input-output pattern:

The organism has (evolutionary) learned to react to an external input u (e.g. toxic agent, viral infection) with a gene-protein activity x(t).

This combination (x,u) is the input-output PATTERN

Page 28: Pattern storage in  gene-protein networks

28

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Memory Storage =

Network Reconstruction

Using these definitions it is possible to map the problem of pattern storage to the * solved * problem of gene network reconstruction with sparse estimation

Page 29: Pattern storage in  gene-protein networks

29

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Information Pattern:

Now, suppose that we have M patterns we want to store in the network:

Page 30: Pattern storage in  gene-protein networks

30

Pattern storage in gene-protein networksPattern storage in gene-protein networks

The relation between the desired patterns (state derivatives, states and inputs) defines constraints on the data matrices A and B, which have to be computed.

Pattern Storage: method 1.0

Page 31: Pattern storage in  gene-protein networks

31

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Computing the optimal A and B for storing the Patterns

The matrices A and B, are sparse (most elements are zero):

Using optimization techniques from robust/sparse optimization, this problem can be defined as:

BUAXXBABA

:tosubject,min11,

Pattern Storage: method 1.0

Page 32: Pattern storage in  gene-protein networks

32

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Number of retrieval errors as a function of the number of nonzero entries k, with: M = 150 patterns, N = 50000 genes.

1st order phase transition from error-free memory retrieval

kC

Page 33: Pattern storage in  gene-protein networks

33

Pattern storage in gene-protein networksPattern storage in gene-protein networks

kC

Number of retrieval errors versus M with fixed N = 50000, k = 10.

1st order phase transition to error-free memory retrieval

Page 34: Pattern storage in  gene-protein networks

34

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Critical number of patterns Mcrit versus the problem size N,

Page 35: Pattern storage in  gene-protein networks

35

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Pattern Storage: method 2.0

A pattern corresponds to a converged state of the system hence:

Therefore a sparse system ∑ = {A,B} is sought that maps the inputs to the patterns {U,X}, which leads to:

0dt

dx

Page 36: Pattern storage in  gene-protein networks

36

Pattern storage in gene-protein networksPattern storage in gene-protein networks

LP:

subject to:

1. condition for stationary equilibrium:

2. condition to avoid A = B = 0:

3. avoid A = 0 by using degradation of proteins

and auto-decay of genes: diag(A) < 0

11,

||||)1(||||minarg*}*,{2

BABApnn RBRA

00 BUAXX

1ˆ BAqT

Computing optimal sparse matrices

1ˆ BAqT

Page 37: Pattern storage in  gene-protein networks

37

Pattern storage in gene-protein networksPattern storage in gene-protein networks

The sparsity in the gene/protein interaction matrix A is

kA : the number of non-zero elements in A

This can be scaled to the size of A: N, and we obtain:

pA = kA/N,

Similarly for the input-coupling B:

pB = kB/P.

The sparsity in A and B

Page 38: Pattern storage in  gene-protein networks

38

Pattern storage in gene-protein networksPattern storage in gene-protein networks

B

A

Results: A

B

gene-gene

input-gene

Page 39: Pattern storage in  gene-protein networks

39

Pattern storage in gene-protein networksPattern storage in gene-protein networks

B

A

A

B

gene-gene

input-gene

Page 40: Pattern storage in  gene-protein networks

40

Pattern storage in gene-protein networksPattern storage in gene-protein networks

sparsity versus the number of stored patterns

There are three distinct regions with different ‘learning’ strategies separated by order transitions

A

B

gene-gene

input-gene

Page 41: Pattern storage in  gene-protein networks

41

Pattern storage in gene-protein networksPattern storage in gene-protein networks

sparsity versus the number of stored patterns

Region I : all information is

exclusively stored in B.

Region II : information is preferably stored in A.

Region III : no clear preference for A or B, Highest ‘order’.

Highest ‘disorder’.

A

B

gene-gene

input-gene

Page 42: Pattern storage in  gene-protein networks

42

Pattern storage in gene-protein networksPattern storage in gene-protein networks

sparsity versus the number of stored patterns

I : ‘impulsive’

II : ‘rational’

III : ‘hybrid’.

A

B

gene-gene

input-gene

Page 43: Pattern storage in  gene-protein networks

43

Pattern storage in gene-protein networksPattern storage in gene-protein networks

The entropy of the macroscopic system relates to the

relative fraction of connections pA and pB as:

As A and B are indiscernible the total entropy is:

Phase transitions and entropy

)1log()1(log AAAAA ppppS

)1log()1(log BBBBB ppppS

BAM SSS

Page 44: Pattern storage in  gene-protein networks

44

Pattern storage in gene-protein networksPattern storage in gene-protein networks

The entropy of the microscopic system A relates to

the degree distribution: the number of connections fi

of node i .

Let P(v) be the probability that a given node has v

outgoing connections: and

Information entropy

N

iiiAAAA vvppppS

1

log)1log()1(log

1)(0

dvvPApdvvvP

0

)(

0

log)( dvvvvPSS M

Page 45: Pattern storage in  gene-protein networks

45

Pattern storage in gene-protein networksPattern storage in gene-protein networks

With P the Laplace distribution for large networks the average entropy per node converges to:

Information entropy [2]

)log(11log 2AEAAM ppNpS

N

Ss

With Euler's constant. 0.5772....E

Page 46: Pattern storage in  gene-protein networks

46

Pattern storage in gene-protein networksPattern storage in gene-protein networks

This also allows the computation of the gain in information entropy if one connection is added:

Information gain per node

N

s

Information gain per node

If this formalism is applied to our network structure we obtain:

Page 47: Pattern storage in  gene-protein networks

47

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Left: the entropy S versus for n=100, p=30, based on 1180 observations, Right: the gain in entropy for the same data set.

Again the three learning strategies are clearly visible {impulsive, rational, hybrid}

Information gain per node

Page 48: Pattern storage in  gene-protein networks

48

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Relation between pA = kA/n and pB = kB/p

averaged for 10116 measurements. .

Relation between sparsities

Page 49: Pattern storage in  gene-protein networks

49

Pattern storage in gene-protein networksPattern storage in gene-protein networks

5. Conclusions

Non-linear time-invariant state space models for gene-protein networks exhibit a range of complex behaviours for storing input-output patterns in sparse representations.

In this model information processing (=computing) and pattern storage (=learning) exhibit multiple distinct 1st and 2nd order continuous phase transitions

There are two second-order phase transitions that divide the network learning in three distinct regions, ‘impulsive’, ‘rational’, ‘hybrid’.

Page 50: Pattern storage in  gene-protein networks

50

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Other members of trans-national University Limburg -Bioinformatics Research Team

University of Hasselt (Belgium):

• Goele Hollanders (PhD student)• Geert Jan Bex• Marc Gyssens

University of Maastricht (Netherlands):

• Stef Zeemering (PhD student)• Karl Tuyls• Ralf Peeters

Page 51: Pattern storage in  gene-protein networks

51

Pattern storage in gene-protein networksPattern storage in gene-protein networks

Discussion …

Ronald Westra

Department of Mathematics

Maastricht University