BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory...

44
BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks

Transcript of BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory...

Page 1: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

BENG 203:Genomics, Proteomics & Network Biology

Trey IdekerVineet Bafna

Inferring gene regulatory networks

Page 2: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Gardner, di Bernardo, Lorenz, and Collins. Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling. Science 301, pp.102-105 (2003)

Cell. 2011 Jan 21;144(2):296-309.

Densely interconnected transcriptional circuits control cell states in human hematopoiesis.

Novershtern N, Subramanian A, Lawton LN, Mak RH, Haining WN, McConkey ME, Habib N, Yosef N, Chang CY, Shay T, Frampton GM, Drake AC, Leskov I, Nilsson B, Preffer F, Dombkowski D, Evans JW, Liefeld T, Smutko JS, Chen J, Friedman N, Young RA, Golub TR, Regev A, Ebert BL.

Reading assignment

Page 3: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Early efforts for network inference: OUTLINE• Boolean Networks

– Gene expression state space– Discrete Dynamical Systems

• Reverse Engineering of Networks– Entropy– Mutual Information– REVEAL algorithm

• Other methods:– Linear and non-linear regression– Bayesian inference methods

Page 4: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

From tutorial by D’Haeseleer, Liang, and Somogyi PSB (2001)

Page 5: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.
Page 6: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.
Page 7: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

0

1

1

1

Where should state 4 appear?

Page 8: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.
Page 9: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

State space and attractor basins

Page 10: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

What are some biological interpretations of basins and attractors?

Page 11: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.
Page 12: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.
Page 13: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Entropy “H”

• Measures amount of information in a signal

High information content = high disorder11111111 00000000 01011100Low Low High

• H(x) = – p1log2p1 – p2log2p2 – … – pmlog2pm

Arguments p1, p2, …, pm are the probabilities (frequencies) for the m possible values of a signal x (probs. must sum to 1)

• Maximum entropy is obtained when all values are equally likely; it approaches 0 when one value dominates

What is the maximum entropy for a binary signal?

Page 14: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

0 1

0

1

H

p(x)

Entropy for m = 2

Page 15: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

What is the maximum entropy possible in this case?

Page 16: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.
Page 17: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Mutual Information

)()(

,log,),(

ypxp

yxpyxpYXM

Yy Xx

Indicates ability to predict value of one variable given the value of the other.

A low value = low predictive ability (independence)A high value = high predictive ability

Page 18: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

H(Y)H(X)

M(X,Y)

H(Y|X)H(X|Y)

H(X,Y)

Venn diagram representation

Page 19: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

A B C

A′ B′ C′

A′ = BB′ = A or CC′ = (A and B) or (B and C) or (A and C)

Page 20: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

REVEAL Algorithm

• To determine the gene input(s) for gene output Y′, identify any gene X for which:

H(X,Y′) = H(X)

• Considering the entropy of the joint output/input is no greater than the input alone—i.e., output is completely determined from input.

• An alternate view in terms of mutual information:M(X,Y′) / H(Y′) = 1

Page 21: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Example: A′ is completely predicted by B

What about other gene outputs?

Page 22: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

REVEAL (continued)

Page 23: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Results with REVEAL

• Liang, Furman and Somogyi (1998). REVEAL: A general reverse engineering algorithm for inference of genetic network architectures, Pac Symp Biocomp.

• Shown to correctly infer small (simulated) networks if given sufficient number of examples

• Data requirements growth exponentially, but can still provide likely results with limited data

• Correlation Metric Construction is a related method that is based on correlation instead of entropy (Adam Arkin and John Ross)

Page 24: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.
Page 25: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Modeling expression with differential equations

Assumes network behavior can be modeled as a system of linear differential equations of the form:

dx/dt = Ax + u

x is a vector representing the continuous-valued levels (concentrations) of each network componentA is the network model: an N x N matrix of coefficients describing how each xi is controlled by upstream genes xj, xk, etc.u is a vector representing an external additive perturbation to the system

Page 26: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

An example:From discrete- to continuous-valued networks

dx/dt = Ax + udx1/dt = a12x2 a13x3

dx2/dt = a21x1

dx3/dt = a32x2

x1 x2 x3

x1 x2 x3

Three genes: x1, x2, x3

x1 activates x2

x2 activates x1 and x3

x3 inhibits x1

0

0

0

00

00

0

3

2

1

32

21

1312

3

2

1

x

x

x

a

a

aa

x

x

x

dt

d

Page 27: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

The steady state assumption

• Near a steady-state point, expression levels do not change over time.

• Under the steady-state assumption, the model reduces to 0 = Ax + u Ax = u

• A straightforward method to infer A would be to apply N perturbations, u, to the network, in each case measuring steady-state expression levels for the x.

• However, in larger networks it may be impractical to apply so many perturbations

• As a simplifying assumption, consider that each gene has a maximum of k non-zero regulatory inputs.

Page 28: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

The inference procedure

Ax = u• Infer inputs to each gene separately• For the given gene, consider all possible

combinations of the k regulatory inputs• For each combination, use multiple linear

regression to determine optimal values of the k coefficients

• Choose the combination that fits the observed data with the least error

Page 29: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Multiple regression

x

u

u = Ax

A is the fit

x1 x 2

u

Page 30: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Review of paper by Gardner et al:

Gardner, di Bernardo, Lorenz, and Collins. Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling. Science 301, pp.102-105 (2003)

Page 31: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Overview

• Systems study of the Escherichia coli DNA damage (SOS) response, a well-studied pathway.

• Systematic transcriptional perturbations (overexpressions) to nine SOS genes, characterized by the steady-state gene expression response of each.

• Use of multiple linear regression to determine a network of causal relations (connections) among these genes.

• A recent example from a larger body of work using systems linear equations to infer regulatory networks.

Page 32: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Application to E. coli SOS pathway

• The SOS pathway regulates cell survival and repair after exposure to DNA damage

• The known pathway involves three tiers of transcription factors (TFs):

1) lexA and recA2) ssb, recF, dinI, umuDC3) rpoD, rpoH, rpoS (so-called ‘sigma’ factors)And more than 30 downstream regulated genes…

1) The known network involving these nine core genes was chosen as a proof-of-principle of the linear regression inference approach

Page 33: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Diagram of SOS pathway interactions

Page 34: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Experimental perturbations

pBADX53 plasmid

In each perturbation, a different gene (of the nine) was overexpressed with an arabinose-controlled expression plasmid

RBS = Ribosome Binding Site

Page 35: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Experimental perturbations

Page 36: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Experimental measurements• For each perturbation and for each of nine transcripts, steady-state

expression levels were measured with quantitative real-time polymerase chain reaction (qPCR).

• The ratio of these perturbed levels to the unperturbed levels was computed.

• Mean and std error was computed over 16 replicate measurements.

Page 37: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Model inferenceThese data were used as a training set to solve for the coefficients in the matrix A, i.e. the regulatory interaction model. The assumed number of inputs k = 5

Page 38: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Diagram of SOS pathway interactions

Page 39: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Actual method performance• The inferred network was compared to the known ‘test’

network.

• Performance was evaluated as the number of connections in the test network that were resolved in the inferred network.

• Here, resolved means that there was a path between the two genes in the inferred network and that the overall sign (+ or , activation or inhibition) was also correct. Wait a minute, what are the implications of this?

Coverage = identified connections/total true connectionsFalse Positive Rate = incorrectly identified connections/

total identified connections

Page 40: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Simulated algorithm performancefor 9 perturbations

7 perturbation subset

Actual experimental datanoise = Sx/x

Page 41: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Simulated algorithm performancefor 9 perturbations

7 perturbation subset

Actual experimental data

Page 42: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Using the model predictively

• To what extent can the model predict expression changes that fall outside of the training set used to build it?

• Along these lines, Gardner et al. use the model to distinguish expression levels of genes that are directly targeted by a drug (the mode of action or MOA) vs. secondary effects.

• The direct targets represent the minimal set of genes that produce the observed expression pattern when externally perturbed.

Page 43: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Procedure of identifying drug MOA

• Measure expression changes xp resulting from treatment with drug

• The drug effect is an unknown external perturbation up that produces the changes:

up = Axp

As proof-of-concept, the following experiments were performed:

• lexA/recA double perturbation• Mitomycin C (MMC) perturbation, known to

activate recA through DNA damage

Page 44: BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Identifying compound mode of action

recA/lexA MMCPerturbation:

The model is much more predictive than are expression data alone…