Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms

37
1 Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms http://research.stowers-institute.org/efg/2004/ CAMDA Critical Assessment of Microarray Data Analysis Conference Earl F. Glynn Stowers Institute Arcady R. Mushegian Stowers Institute & Univ. of Kansas Medical Center Jie Chen Stowers Institute & Univ. of Missouri Kansas City

description

Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms. http://research.stowers-institute.org/efg/2004/CAMDA Critical Assessment of Microarray Data Analysis Conference November 11, 2004. Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms. - PowerPoint PPT Presentation

Transcript of Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms

Page 1: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

1

Searching forPeriodic Gene Expression Patterns Using Lomb-Scargle Periodograms

http://research.stowers-institute.org/efg/2004/CAMDA

Critical Assessment of Microarray Data Analysis ConferenceNovember 11, 2004

Earl F. GlynnStowers Institute

Arcady R. MushegianStowers Institute &

Univ. of Kansas Medical Center

Jie ChenStowers Institute &Univ. of Missouri

Kansas City

Page 2: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

2

Searching for Periodic Gene Expression Patterns Using Lomb-Scargle Periodograms

• Periodic Patterns in Biology

• Introduction to Lomb-Scargle Periodogram

• Data Pipeline

• Application to Bozdech’s Plasmodium dataset

• Conclusions

Page 3: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

3

Periodic Patterns in Biology

Photograph taken at Reptile Gardens, Rapid City, SDwww.reptile-gardens.com

A vertebrate’s body plan: a segmented pattern.Segmentation is established during somitogenesis.

Page 4: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

4

Periodic Patterns in Biology

From Bozdech, et al, Fig. 1A, PLoS Biology, Vol 1, No 1, Oct 2003, p 3.

Intraerythrocytic Developmental Cycle of Plasmodium falciparum

Expression Ratio = RNA from parasitized red blood cells

RNA from all development cycles=

Cy5

Cy3

Values for Log2(Expression Ratio) are approximately normally distributed.Assume gene expression reflects observed biological periodicity.

Page 5: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

5

Simple Periodic Gene Expression Model

Time

Exp

ress

ion

period(T)

period(T)

“On” “On” “On”

“Off” “Off”

frequency = 1

period

f = 1 T

Gene Expression = Constant Cosine(2f t)“Periodic” if only observed over a single cycle?

= angular frequency = 2f

Page 6: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

6

Introduction to Lomb-Scargle Periodogram

• What is a Periodogram?• Why Lomb-Scargle Instead of Fourier?• Example Using Cosine Expression Model• Mathematical Details• Mathematical Experiments

- Single Dominant Frequency

- Multiple Frequencies- Mixtures: Signal and Noise

Page 7: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

7

What is a Periodogram?

• A graph showing frequency “power” for a spectrum of frequencies

• “Peak” in periodogram indicates a frequency with significant periodicity

Time

Log2(Expression)

Periodic Signal Periodogram

Frequency

Spectral“Power”

Computation

Page 8: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

8

Why Lomb-Scargle Instead of Fourier?

• Missing data handled naturally

• No data imputation needed

• Any number of points can be used

• No need for 2N data points like with FFT

• Lomb-Scargle periodogram has known statistical properties

Note: The Lomb-Scargle algorithm is NOT equivalentto the conventional periodogram analysis basedFourier analysis.

Page 9: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

9

Lomb-Scargle PeriodogramExample Using Cosine Expression Model

T = 1 f

0 10 20 30 40

-1.0

0.0

0.5

1.0

Cosine Curve (N=48)

Time [hours]

Ex

pre

ss

ion

N = 48

0.00 0.05 0.10 0.15 0.20

05

10

20

Lomb-Scargle Periodogram

Frequency [1/hour]

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05p = 0.01

p = 0.001p = 1e-04p = 1e-05p = 1e-06

Period at Peak = 48 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.4

0.8

Peak Significance

Frequency [1/hour]

Pro

ba

bili

ty

p = 3.3e-009 at Peak

A small valuefor the false-alarm probability indicates

a highly significant periodic signal.

Evenly-spaced time points

Page 10: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

10

Lomb-Scargle PeriodogramExample Using Noisy Cosine Expression Model

0 10 20 30 40

-1.0

0.0

1.0

Cosine Curve + Noise (N=48)

Time [hours]

Ex

pre

ss

ion

N = 48

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

02

46

8

0.00 0.05 0.10 0.15 0.20

05

10

20

Lomb-Scargle Periodogram

Frequency [1/hour]

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05p = 0.01

p = 0.001p = 1e-04p = 1e-05p = 1e-06

Period at Peak = 45.7 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.4

0.8

Peak Significance

Frequency [1/hour]

Pro

ba

bili

ty

p = 2.54e-007 at Peak

Unevenly-spaced time points

Page 11: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

11

Lomb-Scargle PeriodogramExample Using Noise

0 10 20 30 40

-1.0

0.0

0.5

1.0

Noise (N=48)

Time [hours]

Ex

pre

ss

ion

N = 48

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

02

46

8

0.00 0.05 0.10 0.15 0.20

05

10

20

Lomb-Scargle Periodogram

Frequency [1/hour]

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05p = 0.01

p = 0.001p = 1e-04p = 1e-05p = 1e-06

Period at Peak = 7.4 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.4

0.8

Peak Significance

Frequency [1/hour]

Pro

ba

bili

ty

p = 0.973 at Peak

Page 12: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

12

Lomb-Scargle Periodogram Mathematical Details

Source: Numerical Recipes in C (2nd Ed), p. 577

PN() has an exponential probability distribution with unit mean.

Page 13: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

13

Mathematical Experiment: Single Dominant Frequency

0 10 20 30 40

-1.0

0.0

0.5

1.0

Cosine Curve (N=48)

Time [hours]

Ex

pre

ss

ion

N = 48

0.00 0.05 0.10 0.15 0.20

05

10

20

Lomb-Scargle Periodogram

Frequency [1/hour]

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05p = 0.01

p = 0.001p = 1e-04p = 1e-05p = 1e-06

Period at Peak = 24 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.4

0.8

Peak Significance

Frequency [1/hour]

Pro

ba

bili

ty

p = 3.3e-009 at Peak

Expression = Cosine(2t/24)

Single “peak” in periodogram. Single “valley” in significance curve.

Page 14: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

14

Mathematical Experiment: Multiple Frequencies

0 10 20 30 40

-2-1

01

23

Sum of 3 Cosines (N=48)

Time [hours]

Ex

pre

ss

ion

N = 48

0.00 0.05 0.10 0.15 0.20

05

10

20

Lomb-Scargle Periodogram

Frequency [1/hour]

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05p = 0.01

p = 0.001p = 1e-04p = 1e-05p = 1e-06

Period at Peak = 21.8 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.4

0.8

Peak Significance

Frequency [1/hour]

Pro

ba

bili

ty

p = 0.00246 at Peak

Expression = Cosine(2t/48) + Cosine(2t/24) + Cosine(2t/ 8)

Multiple peaks in periodogram. Corresponding valleys in significance curve.

Page 15: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

15

Mathematical Experiment: Multiple Frequencies

0 10 20 30 40

-20

24

Sum of 3 Cosines (N=48)

Time [hours]

Ex

pre

ss

ion

N = 48

0.00 0.05 0.10 0.15 0.20

05

10

20

Lomb-Scargle Periodogram

Frequency [1/hour]

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05p = 0.01

p = 0.001p = 1e-04p = 1e-05p = 1e-06

Period at Peak = 48 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.4

0.8

Peak Significance

Frequency [1/hour]

Pro

ba

bili

ty

p = 2.37e-007 at Peak

Expression = 3*Cosine(2t/48) + Cosine(2t/24) + Cosine(2t/ 8)

“Weaker” periodicities cannot always be resolved statistically.

Page 16: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

16

Mathematical Experiment: Multiple Frequencies: “Duty Cycle”

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

duty cycle: 1/2

Time [hours]

Exp

ress

ion

N = 48

0.0 0.1 0.2 0.3 0.4 0.5

05

1015

2025

Lomb-Scargle Periodogram

Frequency [1/hour]

Nor

mal

ized

Pow

er S

pect

ral D

ensi

ty

p = 0.05p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 24 hours

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Frequency [1/hour]

Pro

babi

lity

p = 2.54e-007 at Peak

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

duty cycle: 2/3

Time [hours]

Exp

ress

ion

N = 48

0.0 0.1 0.2 0.3 0.4 0.5

05

1015

2025

Lomb-Scargle Periodogram

Frequency [1/hour]

Nor

mal

ized

Pow

er S

pect

ral D

ensi

ty

p = 0.05p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 24 hours

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Frequency [1/hour]

Pro

babi

lity

p = 5.06e-006 at Peak

50% 66.6% (e.g., human sleep cycle)

One peak with symmetric “duty cycle”. Multiple peaks with asymmetric cycle.

Page 17: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

17

Mathematical Experiment: Mixtures: Periodic Signal Vs. Noise

'p' Histogram for 5000 Simulated Expresson Profiles (N= 48 )

log10(p)

Fre

quen

cy

-8 -6 -4 -2 0

050

100

150

p corresponding to max Periodogram Power Spectral Density100 % simulated periodic genes

'p' Histogram for 5000 Simulated Expresson Profiles (N= 48 )

log10(p)

Fre

quen

cy

-8 -6 -4 -2 0

050

010

0015

00 p corresponding to max Periodogram Power Spectral Density50 % simulated periodic genes

'p' Histogram for 5000 Simulated Expresson Profiles (N= 48 )

log10(p)

Fre

quen

cy

-8 -6 -4 -2 0

050

010

0015

0020

00

p corresponding to max Periodogram Power Spectral Density0 % simulated periodic genes

“p” histogram

100% periodic genes 50% periodic50% noise

100% noise

Page 18: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

18

0 1000 2000 3000 4000 5000

-8-6

-4-2

0

Multiple Testing Correction Methods

Rank Order of Sorted p Values

Log1

0(p)

bonferroniholmhochbergfdrnone

50 % simulated periodic genes

Mathematical Experiment: Mixtures: Periodic Signal Vs. Noise

50% periodic, 50% noise

Multiple-Hypothesis Testing

Bonferroni

Holm

Hochberg

Benjamini &Hochberg FDR

None

More False Negatives

More False Positives

Page 19: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

19

Data Pipeline to Apply to Bozdech’s Data

1. Apply quality control checks to data

2. Apply Lomb-Scargle algorithm to all expression profiles

3. Apply multiple hypothesis testing to define “significant” genes

4. Analyze biological significance of significant genes

Page 20: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

20

Bozdech’s Plasmodium dataset:

1. Apply Quality Control Checks

Global views of experiment.Remove certain outliers.

Page 21: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

21

Bozdech’s Plasmodium dataset:

1. Apply Quality Control Checks

Many missing data points require imputation for Fourier analysis.

Page 22: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

22

Bozdech’s Plasmodium dataset:

2. Apply Lomb-Scargle Algorithm

0 10 20 30 40

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

Mean Expression Profile

Time [hours]

Ex

pre

ss

ion

N = 46

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

01

02

03

04

0

0.00 0.05 0.10 0.15 0.20

05

10

15

20

25

Lomb-Scargle Periodogram

Frequency [1/hour]

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 27.4 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Frequency [1/hour]

Pro

ba

bili

ty

p = 0.0581 at Peak

Complete/06-MeanExpressionProf ile.pdf 2004-10-27 11:39

A weak diurnal period is visible in “mean” data profile.

Page 23: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

23

Bozdech’s Plasmodium dataset:

2. Apply Lomb-Scargle Algorithm

0 10 20 30 40

-2-1

01

i3518_1

Time [hours]

Ex

pre

ss

ion

N = 46

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

01

02

03

04

0

0.00 0.05 0.10 0.15 0.20

05

10

15

20

25

Lomb-Scargle Periodogram

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05

p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 45.7 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Pro

ba

bili

ty

p = 1.48e-008 at Peak

Periodic Expression Patterns

0 10 20 30 40

-4-2

02

opfi17638

Time [hours]

Ex

pre

ss

ion

N = 46

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

01

02

03

04

0

0.00 0.05 0.10 0.15 0.20

05

10

15

20

25

Lomb-Scargle Periodogram

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05

p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 45.7 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Pro

ba

bili

ty

p = 1.19e-008 at Peak

Examples of highly-significant periodic expression profiles.

Page 24: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

24

Bozdech’s Plasmodium dataset:

2. Apply Lomb-Scargle Algorithm

0 10 20 30 40

-0.5

0.0

0.5

1.0

j167_5

Time [hours]

Ex

pre

ss

ion

N = 35

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

05

10

15

20

25

0.00 0.05 0.10 0.15 0.20

05

10

15

20

25

Lomb-Scargle Periodogram

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05

p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 17.8 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Pro

ba

bili

ty

p = 0.998 at Peak

Aperiodic/Noise Expression Patterns

0 10 20 30 40

-1.0

-0.5

0.0

0.5

1.0

1.5

f35105_2

Time [hours]

Ex

pre

ss

ion

N = 45

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

01

02

03

04

0

0.00 0.05 0.10 0.15 0.20

05

10

15

20

25

Lomb-Scargle Periodogram

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05

p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 32 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Pro

ba

bili

ty

p = 0.516 at Peak

Page 25: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

25

Bozdech’s Plasmodium dataset:

2. Apply Lomb-Scargle Algorithm

0 10 20 30 40

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

f58149_1

Time [hours]

Ex

pre

ss

ion

N = 39

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

05

10

15

20

25

30

0.00 0.05 0.10 0.15 0.20

05

10

15

20

25

Lomb-Scargle Periodogram

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05

p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 48 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Pro

ba

bili

ty

p = 8.54e-006 at Peak

Small “N”

N=39

0 10 20 30 40

-3-2

-10

12

n170_1

Time [hours]

Ex

pre

ss

ion

N = 32

Time Interval Variability

log10(delta T)

Fre

qu

en

cy

-1.0 -0.5 0.0 0.5 1.0

05

10

15

20

25

30

0.00 0.05 0.10 0.15 0.20

05

10

15

20

25

Lomb-Scargle Periodogram

No

rma

lize

d P

ow

er

Sp

ec

tra

l D

en

sit

y

p = 0.05

p = 0.01

p = 0.001

p = 1e-04

p = 1e-05

p = 1e-06

Period at Peak = 64 hours

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

Peak Significance

Pro

ba

bili

ty

p = 2.74e-005 at Peak

N=32

Page 26: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

26

Bozdech’s Plasmodium dataset:

2. Apply Lomb-Scargle AlgorithmSignal and Noise Mixture

'p' histogram

log10(p)

Num

ber

of P

robe

s

-8 -6 -4 -2 0

050

100

150

200

Complete Bozdech set of 6875 probes

Periodic Probes Aperiodic Probes or Noise

histogram-log10p.pdf 2004-11-06 10:26

Page 27: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

27

Bozdech’s Plasmodium dataset:

3. Apply Multiple-Hypothesis Testing

= 1E-4

Bonferroni

Holm

Hochberg

Benjamini &Hochberg FDR

None

More False Negatives

More False Positives

0 1000 2000 3000 4000 5000 6000 7000

-8-6

-4-2

0

Multiple Testing Correction Methods

Rank Order of Sorted p Values

Log1

0(p)

bonferroniholmhochbergfdrnone

(Using R's p.adjust methods)

p-adjust.pdf 2004-11-06 10:12

Significance

Page 28: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

28

Bozdech’s Plasmodium dataset:

3. Apply Multiple-Hypothesis Testing

p Adjustment

Method

Significance Level

0.05 0.01 0.001 0.0001 0.00001

Bonferroni 3707 3050 1461 13 0

Holm 3995 3351 1705 13 0

Hochberg 4009 3359 1723 15 0

Benjamini & Hochberg FDR

5618 5315 4906 4358 3584

None 5648 5351 4961 4456 3823A priori plan: Use Benjamini & Hochberg FDR level of 0.0001.

Observed number of periodic probes consistent with biological observation of ~60% of Plasmodium genome being transcriptionally active during the intraerythrocytic developmental cycle.

Page 29: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

29

Bozdech’s Plasmodium dataset:

4. Analyze Biological SignificanceLomb-Scargle: 4358 Probes, = 1E-4 significance

Comparison with Bozdech’s Results

Dataset Ntime series points

Probes Lomb-ScarglePeriodic

%

Bozdech Complete

43 .. 46(Bozdech

Quality Control Dataset)

5080 4115 81.0%

32 .. 42 1795 243 13.5

Total 6875 4358 63.4While Lomb-Scargle identified 243 new low “N” periodic probes, thelow percentage in that group may indicate some other problem.

Page 30: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

30

Bozdech’s Plasmodium dataset:

4. Analyze Biological SignificanceLomb-Scargle: 4358 Probes, = 1E-4 significance

Comparison with Bozdech’s Results

Unclear how to apply Bozdech’s ad hoc “Overview” criteria for use with Lomb-Scargle method: “70% power in max frequency with top 75% of max frequency magnitude.”

The best 3711 Lomb-Scargle “p” values contained 3449 (92.9%) of the Overview probes.

Dataset Probes Lomb-ScarglePeriodic

Bozdech Overview

3711 3611

Page 31: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

31

Bozdech’s Plasmodium dataset:

4. Analyze Biological Significance

Lomb-Scargle Results 4358 Probes

“Phaseograms”

Time

Pro

bes

Ord

ered

by

Pha

se

TimeP

robe

s O

rder

ed b

y P

hase

Bozdech: “Overview” Dataset2714 genes, 3395 probes

Page 32: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

32

Bozdech’s Plasmodium dataset:

4. Analyze Biological SignificanceLomb-Scargle: 4358 Probes, = 1E-4 significance

Periodogram Map

Frequency

Pro

bes

Ord

ered

by

Pea

k F

requ

ency • Shows periodograms,

not expression profiles

• Shows frequency space, not time

• Dominant frequency band corresponds to 48-hr period

•Are “weak” bands indicative of complex expression, perhaps a diurnal component, or an asymmetric “duty cycle”?Period

Page 33: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

33

SummaryLomb-Scargle Method Fourier Method

Weights data points Weights frequency intervals

No special requirement Requires uniform spacing

No special processing Missing data imputed

No special requirement 2N points for FFT; 0 padding

Known statistical properties Permutation tests needed to assess statistical properties

Use “p” values Ad hoc scoring rules

Need estimate of number of “independent frequencies” but explore using continuum

Usually only look at “independent” Fourier frequencies

Page 34: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

34

Conclusions• Lomb-Scargle periodogram is effective tool

to identify periodic gene expression profiles

• Results comparable with Fourier analysis

• Lomb-Scargle can help when data are missing or not evenly spaced

We wanted to validate the Lomb-Scargle method before applying to our somitogenesis problem, since the Fourier technique would be difficult to use. Scargle (1982): “surprising result is that the … spectrum of a process can be estimated … [with] only the order of the samples ...”

Page 35: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

35

Conclusions

• Conclusions should not be drawn using the individual p-value calculated for each profile.  A multiple comparison procedure False Discovery Rate (FDR) must be used to control the error rate.

• Expression profiles may be more complex than simple cosine curves

• Power spectra of non-sinusoid rhythms are more difficult to interpret

Page 36: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

36

Supplementary Information

http://research.stowers-institute.org/efg/2004/CAMDA

Page 37: Searching for Periodic Gene Expression Patterns  Using Lomb-Scargle Periodograms

37

Acknowledgements

                                  

Stowers Institute for Medical Research

Pourquie LabOlivier Pourquie

Mary-Lee Dequeant