Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst....

Post on 14-Dec-2015

225 views 0 download

Tags:

Transcript of Analysis of time-course gene expression data Shyamal D. Peddada Biostatistics Branch National Inst....

Analysis of time-course Analysis of time-course

gene expression datagene expression data

Shyamal D. PeddadaBiostatistics Branch

National Inst. Environmental Health Sciences (NIH)

Research Triangle Park, NC

Outline of the talkOutline of the talk

Some objectives for performing “long series” time-course experiments

A. Single cell-cycle experiment

– A nonlinear regression model– Phase angle of a cell cycle gene– Inference– Open research problems

B. Multiple cell-cycle experiments

– “Coherence” between multiple cell-cycle experiments– Illustration– Open research problems

ObjectivesObjectives

Some genes play an important role during the cell division cycle process. They are known as “cell-cycle genes”.

Objectives: Investigate various characteristics of cell-cycle and/or circadian genes such as:

– Amplitude of initial expression– Period– Phase angle of expression (angle of maximum

expression for a cell cycle gene)

Phases in cell division cycle

A brief descriptionA brief description

• G1 phase:

"GAP 1". For many cells, this phase is the major period of cell growth during its lifespan.

• S ("Synthesis”) phase:

DNA replication occurs.

A brief descriptionA brief description

• G2 phase:

"GAP 2“: Cells prepare for M phase. The G2 checkpoint prevents cells from entering mitosis when DNA was damaged since the last division, providing an opportunity for DNA repair and stopping the proliferation of damaged cells.

• M (“Mitosis”) phase:

Nuclear (chromosomes separate) and cytoplasmic (cytokinesis) division occur. Mitosis is further divided into 4 phases.

Single, long series experiment … Single, long series experiment …

Whitfield Whitfield et al.et al. ((Molecular Biology of the CellMolecular Biology of the Cell, 2002), 2002)

Basic design is as follows:

Experimental units: Human cancer cells (HeLa)

Microarray platform: cDNA chips used with approx 43000 probes (i.e. roughly 29000 genes)

3 different patterns of time points (i.e. 3 different experiments)

One of the goals of these experiments was to identify periodically expressed genes.

Whitfield Whitfield et al.et al. ((Molecular Biology of the CellMolecular Biology of the Cell, 2002), 2002)

Experiment 1: (26 time points)

Hela cancer cells arrested in the S-phase using double thymidine block.

Sampling times after arrest (hrs):

– 0 1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 18 20 22 24 26 28 32 36 40 44.

Whitfield Whitfield et al.et al. (2002) (2002)

Experiment 2: (47 time points)

Hela cancer cells arrested in the S-phase using double thymidine block.

Sampling times after arrest (hrs):

– every hour between 0 and 46.

Whitfield Whitfield et al.et al. (2002) (2002)

Experiment 3: (19 time points)

Hela cancer cells arrested arrested in the M-phase using thymidine and then by nocodazole.

Sampling times after arrest (hrs):

– 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36.

Whitfield Whitfield et al.et al. (2002) (2002)Phase marker genes:Phase marker genes:

Cell Cycle Phase Genes------------------ -------

G1/S CCNE1, CDC6, PCNA,E2F1

S RFC4, RRM2

G2 CDC2, TOP2A, CCNA2, CCNF

G2/M STK15, CCNB1, PLK, BUB1

M/G1 VEGFC, PTTG1, CDKN3, RAD21

QuestionsQuestions

Can we describe the gene expression of a cell-cycle gene as a function of time?

Can we determine the phase angle for a given cell-cycle gene? i.e. can we quantify the previous table in terms of angles on a circle?

What is the period of expression for a given gene?

Can we test the hypothesis that all cell-cycle genes share the same time period?

Etc.

Profile of PCNA based on Profile of PCNA based on experiment 2 dataexperiment 2 data

Some important observationsSome important observations

1. Gene expression has a sinusoidal shape

2. Gene expression for a given gene is an average value of mRNA levels across a large number of cells

3. Duration of cell cycle varies stochastically across cells

4. Initially cells are synchronized but over time they fall out of synchrony

5. Gene expression of a cell-cycle gene is expected to “decrease/decay” over time. This is because of items 2 and 4 listed above!

Random Periods Model Random Periods Model (PNAS, 2004)(PNAS, 2004)

dzz

zT

tCos

Kbtatf ⎟⎟

⎞⎜⎜⎝

⎛−⎟⎟

⎞⎜⎜⎝

⎛+++= ∫

∞− 2exp

) (exp2

2 )(

2

φσ

ππ

• a and b: background drift parameters• K: the initial amplitude• T: the average period• the attenuation parameter• the phase angle

:σ:φ

Fitted curves for some phase marker Fitted curves for some phase marker genesgenes

Whitfield Whitfield et al.et al. (2002) (2002)Phase marker genes:Phase marker genes:

Phase Genes Phase angles (radians)

-------- ------- ------------------------

G1/S CCNE1, CDC6, PCNA,E2F1 0.56, 5.96, 5.87, 5.83

S RFC4, RRM2 5.47, 5.36

G2 CDC2, TOP2A, CCNA2, CCNF 4.24, 3.74, 3.55, 3.25

G2/M STK15, CCNB1, PLK, BUB1 3.06, 2.67, 2.61, 2.51

M/G1 VEGFC, PTTG1, CDKN3, RAD21 2.66, 2.40, 2.25, 1.81

A hypothesis of biological interestA hypothesis of biological interest

Do all cell cycle genes have same T and same but the other 4 parameters are gene specific? i.e.

σ

gTTH gg genes allfor ,:0 σσ ==

An Important FeatureAn Important Feature

Correlated data

– Temporal correlation within gene

– Gene-to-gene correlations

Test StatisticTest Statistic

Wald statistic for heteroscedastic linear and non-linear models

– Zhang, Peddada and Rogol (2000)– Shao (1992)– Wu (1986)

The Null DistributionThe Null Distribution

Due to the underlying correlation structure

– Asymptotic approximation is not appropriate.

– Use moving-blocks bootstrap technique on the residuals of the nonlinear model.

Kunsch (1989)

Moving-blocks BootstrapMoving-blocks Bootstrap

Step 1: Fit the null model to the data and compute the residuals.

Step 2: Draw a simple random sample (with replacement) from all possible blocks , of a specific size, of consecutive residuals.

Moving-blocks BootstrapMoving-blocks Bootstrap

Step 3: Add these residuals to the fitted curve under the null hypothesis to obtain the bootstrap data set

Step 4: Using the bootstrap data fit the model under the alternate hypothesis and compute the Wald statistic.

Moving-blocks BootstrapMoving-blocks Bootstrap

Step 5: Repeat the above steps a large number of times.

Step 6: The bootstrap p-value is the proportion of the above Wald statistics that exceed the Wald statistic determined from the actual data.

Analysis of experiment 2Analysis of experiment 2

The bootstrap p-value for testing

using Experiment 2 data of Whitfield et al. (2002) is 0.12.

Thus our model is biologically plausible.

σσ == gg TTH ,:0

Statistical inferences on the phase angle φ

Multiple experimentsMultiple experiments

Some questions of interestSome questions of interest

How to evaluate or combine results from multiple cell division cycle experiments?

– Are the results “consistent” across experiments?

How to evaluate this?What could be a possible criterion?

DataData

: RPM estimate of phase angle of a cell-cycle gene ‘g’

from the experiment.

ig ,φ̂thi

Representation using a circleRepresentation using a circle

Consider 4 cell cycle genes A, B, C, D. The vertical line in the circle denotes the reference line. The angles are measured in a counter-clockwise.

Thus the sequential orderof expression in thisexample is A, B, D, C.

A

D

B

C

““Coherence” in multiple cell-cycle Coherence” in multiple cell-cycle experimentsexperiments

A group of cell cycle genes are said to be coherent across experiments if their sequential order of the phase angles is preserved across experiments.

A

D

B

C

D

A

C

B

B

C

D

A

Exp 1

Exp 2

Exp 3

Geometric RepresentationGeometric Representation

We shall represent phase angles from multiple cell cycle experiments using concentric circles.

Each circle represents an experiment.

Same gene from a pair of experiments is connected by a line segment.

– A figure with non-intersecting lines indicates perfect coherence.

– If there is no coherence at all then there will be many intersecting lines.

Example: Perfectly CoherentExample: Perfectly Coherent

Example: Perfectly CoherentExample: Perfectly Coherent

Example: No coherenceExample: No coherence

Estimated Phase AnglesEstimated Phase Angles

Due to statistical errors in estimation, the estimated phase angles from multiple cell cycle experiments need not preserve the sequential order even though the true phase angles are in a sequential order.

How to evaluate coherence?How to evaluate coherence?

Some background on regression Some background on regression for circular datafor circular data

Experiment A Experiment B

Question: Can we determine a rotation matrix A such thatwe can rotate the circle representing Experiment A to obtain the circle representing Experiment B?

1,3̂φ

1,1̂φ

1,2̂φ

2,3̂φ

2,1̂φ

2,2̂φ

Angle of rotation for a rigid body

Yes! By solve the following minimization problem:

221,

12, ||ˆˆ||min g

n

gg

SAAφφ∑

=∈−

⎟⎟

⎜⎜

−=

uvuv

uvuvA

||

||

ˆcos ˆsin

ˆsin ˆcosˆ

θθ

θθ

Determination of Coherence Determination of Coherence Across “k” ExperimentsAcross “k” Experiments

The Basic IdeaThe Basic Idea

Consider a rigid body rotating in a plane. Suppose the body is perfectly rigid with no deformations.

Let denote the 2x2 rotation matrices from

experiment i to i+1 (k+1 = 1). Then

Alternatively

1+→ iiA

kkk AAAAA →→−→→→ = 11433221 . . .

IAAAAA

IAAAAA

kkk

kkk

=⇔

=

→→−→→→

→→−→→→

11433221

11433221

. . .

'. . .

The Basic IdeaThe Basic Idea

Equivalently, if

Then under perfect rigid body motion we should have

Ai→i+1 =cos ˆ θ i+1| i sin ˆ θ i+1| i

−sin ˆ θ i+1| i cos ˆ θ i+1| i

⎝ ⎜ ⎜

⎠ ⎟ ⎟

1)cos(1

|1 =∑=

+

k

iiiθ

Problem!Problem!

In the present context we do NOT necessarily have a rigid body!

– Not all experiments are performed with same precision.

– The time axis may not be constant across experiments.

– Number of time points may not be same across experiments.

– Etc.

Example: Not a rigid motion Example: Not a rigid motion

but perfectly coherentbut perfectly coherent

Consequence

Rotation matrix A alone may not be enough to bring two circles to congruence!

An additional “association/scaling” parameter may be needed as see in the previous figure!

Circular-Circular regression model Circular-Circular regression model for a pair of experiments for a pair of experiments (Downs and Mardia, 2002)(Downs and Mardia, 2002)

For , let denote a pair of

angular variables.

Suppose is von-Mises distributed with

mean direction and concentration parameter

)ˆ,ˆ( 2,1, gg φφ

μ κ

Gg ,...,2,1=

1,2,ˆ|ˆgg φφ

Circular-Circular Regression Model Circular-Circular Regression Model (Downs and Mardia, 2002)(Downs and Mardia, 2002)

where),2

ˆtan()

2tan( 1|21,

1|21|2 βφ

ωαμ −

=− g

parameter"n associatio"

rotation of angle the

12

1|21|21|2

=

=−=

βαθ

πθπω ≤≤−≤≤ 1212 ,10 ||

The regression model is given by the link function

Back to the toy examplesBack to the toy examples

0 |ˆˆˆ| ),1,1,1()ˆ ,ˆ ,ˆ( |||||| =++= CABCABCABCAB θθθωωω

0 |ˆ ˆ ˆ| ),20,.34,.64(.)ˆ ,ˆ ,ˆ( |||||| =++= CABCABCABCAB θθθωωω

2.2 |ˆ ˆ ˆ| ),0,0,0()ˆ ,ˆ ,ˆ( |||||| ≈++≈ CABCABCABCAB θθθωωω

Determination Of CoherenceDetermination Of Coherence

Suppose we have K experiments, labeled as

1, 2, 3, …, K. Let denote the angle of rotation

for the regression of i on j for a group of g genes.

Compute

Note .

ji|θ̂

|ˆ|1

1|∑=

+

K

iiiθ

11≡+K

Determination Of CoherenceDetermination Of Coherence

We expect under no coherence

to be “stochastically” larger than

under coherence.

|ˆ|1

1|∑=

+

K

iiiθ

|ˆ|1

1|∑=

+

K

iiiθ

Comparison of Cumulative Comparison of Cumulative Distribution FunctionsDistribution Functions

Blue line: CoherencePink line: No Coherence

Determination Of CoherenceDetermination Of Coherence

For a given data compute

Generate the bootstrap distribution of

under the null hypothesis of no coherence.

|ˆ| 1

1|∑=

+=K

iiic θ

|ˆ| 1

1|∑=

+

K

iiiθ

Bootstrap P-value For CoherenceBootstrap P-value For Coherence

Let denote the angle of rotation using

the bootstrap sample. Then the P-value is:

c) |ˆ| P(1

*

1|≤∑

=+

K

iii

θ

*1|

ˆ+iiθ

Illustration: Whitfield Illustration: Whitfield et alet al. data. data

There are 3 experiments. The phase angles of each gene was estimated using Liu et al., (2004) model.

A total of 47 common cell-cycling genes were selected from the three experiments.

EstimatesEstimates

The estimated values of interest are

Note that

2.59) 3.03,- 0.5,( )ˆ ,ˆ ,ˆ(

),64.0,70.0,67.0()ˆ ,ˆ ,ˆ(

|||

3|12|31|2

=

=

CABCAB θθθ

ωωω

radians 0.06 |ˆˆ ˆ| 3|12|31|2 =++ θθθ

029.0 0.06) |ˆˆ ˆP(| ***

3|12|31|2≈≤++ θθθ

ConclusionConclusion

Since the bootstrap P-value < 0.05, we conclude that the three experiments are coherent.

Accession Gene Symbol Phase (rad) Res (rad) Dispersion (rad) A B C B - B|A C - C|B A - A|C Cir_dist AA135809 EST 0.882 0.040 3.399 -0.29 0.66 -0.10 0.04 W93120 EST 0.260 0.427 2.580 0.52 -0.58 0.53 0.21 T54121 CCNE1* 1.191 0.559 2.661 0.02 -0.65 1.35 0.33 AA131908 FLJ10540 3.534 2.220 6.186 -0.65 0.65 -0.08 0.25 AA088457 EST 2.613 2.373 5.700 0.66 -0.02 -0.68 0.08 AA464019 E2-EPF 3.478 2.464 5.798 -0.33 -0.02 0.12 0.07 AA430092 BUB1 3.566 2.510 6.132 -0.41 0.26 -0.01 0.11 AA425404 FLJ10156 3.508 2.519 6.241 -0.32 0.36 -0.14 0.12 H73329 C20orf1 3.494 2.594 5.873 -0.22 -0.09 0.08 0.04 AA629262 PLK 3.314 2.613 5.888 0.05 -0.10 -0.11 0.02 AA157499 MAPK13 3.390 2.615 5.784 -0.05 -0.20 0.04 0.02 AA282935 MPHOSPH1 3.826 2.667 6.233 -0.64 0.19 0.18 0.12 AA053556 MKI67 3.600 2.731 5.665 -0.24 -0.44 0.33 0.06 AA279990 TACC3 3.804 2.810 0.275 -0.46 0.37 -0.05 0.13 AA402431 CENPE 3.556 2.892 5.939 -0.01 -0.33 0.10 0.01 R11407 STK15 3.484 2.940 5.869 0.14 -0.44 0.08 0.01 AA598776 CDC20 3.355 2.957 5.854 0.34 -0.47 -0.04 0.00 AA262211 KIAA0008 3.457 2.989 5.918 0.23 -0.44 0.02 0.00 AA421171 NUF2R 3.785 3.000 5.679 -0.24 -0.69 0.50 0.10 AA010065 CKS2 3.341 3.030 5.826 0.43 -0.57 -0.04 0.02 AA292964 CKS2 3.312 3.037 5.980 0.48 -0.42 -0.17 0.01 AA430511 FLJ14642 4.170 3.244 1.653 -0.57 1.35 -0.74 0.70 AA430511 FLJ14642 4.170 3.244 1.474 -0.57 1.17 -0.57 0.55 AA676797 CCNF 4.024 3.249 1.170 -0.35 0.86 -0.46 0.36 AA458994 PMSCL1 0.841 3.387 0.298 -0.15 -0.13 0.12 0.01 AA235662 FLJ14642 3.653 3.396 1.278 0.35 0.85 -0.92 0.51 N63744 FLJ10468 3.864 3.511 0.637 0.15 0.11 -0.23 0.07 AA620485 ANKT 3.709 3.531 0.923 0.40 0.38 -0.59 0.24 AA608568 CCNA2 3.857 3.541 6.133 0.19 -0.70 0.28 0.05 R96941 C20orf129 3.751 3.546 0.667 0.36 0.11 -0.36 0.11 AA504625 KNSL1 4.107 3.551 0.410 -0.17 -0.15 0.17 0.00 AI053446 EST 4.348 3.612 1.256 -0.45 0.65 -0.21 0.21 R22949 EST 4.164 3.631 0.161 -0.17 -0.46 0.39 0.02 AA452513 KNSL5 3.915 3.730 0.192 0.29 -0.50 0.12 0.03 T66935 DKFZp762E1312 4.193 3.884 0.800 0.04 -0.01 -0.01 0.03 AA099033 USP1* 5.000 4.760 2.876 -0.12 1.43 -1.45 0.71 AA485454 EST 4.886 5.086 0.891 0.33 -0.79 0.61 0.24 AA485454 EST 4.275 5.086 0.891 1.12 -0.79 0.00 0.44 AA485454 EST 4.886 5.235 0.891 0.48 -0.90 0.61 0.32 AA485454 EST* 4.275 5.235 0.891 1.27 -0.90 0.00 0.55 AA620553 FEN1 5.897 5.510 3.028 -0.21 1.02 -0.79 0.23 AA425120 CHAF1B 5.697 5.714 1.685 0.16 -0.49 0.76 0.16 N57722 MCM6 0.047 5.817 2.568 -0.23 0.31 0.34 0.00 AA450264 PCNA 0.195 5.858 2.438 -0.29 0.14 0.67 0.02 H51719 ORC1L 5.906 5.917 2.889 0.19 0.54 -0.57 0.14 H59203 CDC6 0.551 5.968 2.723 -0.43 0.33 0.61 0.04 R06900 RAMP 0.243 6.049 2.889 -0.13 0.42 0.06 0.00

Statistical inferences on the phase angle

- Some open problems

φ

Estimation subject to inequality Estimation subject to inequality constraintsconstraints

It is reasonable to hypothesize that for a normal cell division cycle, the p phase marker genes must express in an order around the unit circle.

Thus they must satisfy:

πφφφ 2...0 21 ≤≤≤≤≤ p

Open problemsOpen problems- data from single experiment- data from single experiment

How to estimate the phase angles subject to the simple order restriction?

More generally - wow to estimate the phase angles subject isotropic simple order restriction?

How to test the above hypothesis? What are the null and alternative hypotheses?

πφφφ 2...0 21 ≤≤≤≤≤ p

pφφφ ≤≤≤ ...21

Open problems – data from multiple Open problems – data from multiple experimentsexperiments

How do we estimate the phase angles from multiple experiments under the order restriction on the phase angles of cell cycle genes?

What are the statistical errors associated with such an estimator?

How to construct confidence intervals and test hypotheses?

AcknowledgmentsAcknowledgments

Delong Liu (former Post-doc at NIEHS) David Umbach (NIEHS) Leping Li (NIEHS) Clare Weinberg (NIEHS) Pat Crocket (Constella Group) Cristina Rueda (Univ. of Valladolid, Spain) Miguel Fernandez (Univ. of Valladolid, Spain)