Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the...

55

Transcript of Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the...

Page 1: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.
Page 2: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Buried treasures Old statistics in new contexts

Page 3: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

“If I have seen further it is by standing on the shoulders of giants”

- Isaac Newton

Page 4: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

One form of the past effect

You are dealing with a statistical problem in a special context.

You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context.

-

-

Page 5: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Three vignettes

V2: Bootstrapping and rank statistics (theory)

V1: Genomics meets sample surveys (methodology)

V3: Cancer genetics and stochastic geometry (application)

Page 6: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

V2: Bootstrapping and rank statistics (theory)

V1: Genomics meets sample surveys (methodology)

V3: Cancer genetics and stochastic geometry (application)

Page 7: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

John Tukey

V1: Genomics meets sample surveys

Context

Second-order gene-set enrichment analysis

Buried treasure

J.W. Tukey, 1950, Some sampling simplified. J. Amer. Statist. Assoc., 45, 501-519.

Page 8: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Context

D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor , TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007).

Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619.

MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007).

Second order enrichment analysis of microarray expression datareveals gene sets with heterogeneous activation states. Submitted.

Page 9: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Context

D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor , TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007).

Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619.

MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007).

Second order enrichment analysis of microarray expression datareveals gene sets with heterogeneous activation states. Submitted.

Page 10: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Slice of expression data from Pyeon et al. 2007

genes(a few)

tissue samplesHPV + HPV -

Page 11: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Fold changes between HPV+ and HPV- (all genes)

-2 -1 0 1 2

den

sity

log2 [ HPV+ / HPV- ]

Page 12: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

The post-processing problem

expression exogenous

results biology

+

Page 13: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Exogenous biology

B = { c: c = {genes with specific property } }

- gene ontology (GO)

- Kyoto Encylopedia (KEGG)

e.g.

Page 14: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

In HPV example, cell cycle may be an interesting gene set

Large sample variance(largest in KEGG, GO)

Excess differential expressionin both directions

Page 15: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

u s,c( ) =1

m −1sg − s c( )

2

g∈c

Expression results:

s = s1,s2,L ,sG( )

Gene set:

c ⊂ 1,2,L ,G{ }

c ∈ B

Gene set variance:

Standardized statistic:

z(s,c) =u(s,c) − E u(s,C){ }

var u(s,C){ }

Page 16: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Centering:

E u(s,C){ } =1

G −1sg − s ( )

2

g=1

G

Connection: C indexes a simple random sample of genes I.e. finite population sampling

Scaling:

var u(s,C){ } = ??

Page 17: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

var u(s,C){ } =1

m−

1

G

⎝ ⎜

⎠ ⎟b1

Tδ(s) +2

m −1−

2

G −1

⎝ ⎜

⎠ ⎟b2

Tδ(s)

We get:

following Tukey’s 1950 calculation involving “K” functions: set-level statistics whose expected value equals the same statistic computed on the whole population

Page 18: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

1

Gγ 4

1

G(G −1)γ 2

2 − γ 4( )

1

G(G −1)γ1 γ 3 − γ 4( )

1

G(G −1)(G − 2)γ

1

2 γ 2 − 2γ1γ 3 − γ 22 + 2γ 4( )

1

G(G −1)(G − 2)(G − 3)γ

1

4 + 8γ1γ 3 + 3γ 22 − 6γ1

2γ 2 − 6γ 4( )

1 0

-3 1

-4 0

12 -2

-6 1

b1 b2

δ s( )

where

γk = sgk∑

Page 19: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

V2: Bootstrapping and rank statistics (theory)

V1: Genomics meets sample surveys (methodology)

V3: Cancer genetics and stochastic geometry (application)

Page 20: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

V2: Bootstrapping and rank statistics

Context

Mason and Newton, 1992, A rank statistics approach to theConsistency of a general bootstrap. Ann. Statist., 20,1611-24

Buried treasure

J. Hajak, 1961, Some extensions of the Wald-Wolfowitz-Noether theorem. Ann. Math. Statist., 32, 506-523.

Jaroslav Hajek

Page 21: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

CLT:

n X n − μ( )

σ⇒ N 0,1[ ]

Bootstrap mean:

X n* =

1

nMn,i

i=1

n

∑ x i

Data:

X = (X1, X2,L ) iid

μ,σ 2( )

Bootstrap CLT:

n X n* − x n( )

sn

⇒ N 0,1[ ] a.s. x

multinomials

Page 22: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Generalized bootstrap: exchangeableweights

X nW =

1

nWn,i

i=1

n

∑ x i

Mason, Newton asked: What is CLT for this case?

Page 23: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

an,i : i =1,2,L ,n{ }€

n

bn,i : i =1,2,L ,n{ }€

n

Consider two triangular arrays of numbers

Tn = an,π n,i

i=1

n

∑ bn,iAnd the sum

For a random permutation

π n,1, π n,2, L , π n,n( )

Page 24: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Tn = an,π n,i

i=1

n

∑ bn,iNotes about:

- Linear rank statistic; studied in nonparametrics.

- Hajak 1961 gives weak conditions for AN

Page 25: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Back to the general bootstrap problem:

This is precisely a linear rank statistic, and Hajek (1961)gives general conditions for its asymptotic normality.

Key fact:

X nW =D X n

Wπ =1

nWn,π n,i

i=1

n

∑ x i random permutation

Now condition on both data

X = x and weights

W = w

Tn =1

nwn,π n,i

i=1

n

∑ x i

Page 26: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

V2: Bootstrapping and rank statistics (theory)

V1: Genomics meets sample surveys (methodology)

V3: Cancer genetics and stochastic geometry (application)

Page 27: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

V3: Cancer genetics and stochastic geometry

Context

Cellular events during tumor initiation, intestinal cancer

Buried treasure

P. Armitage, 1949, An overlap problem arising in particle counting. Biometrika, 45, 501-519.

Peter Armitage

Page 28: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Context

AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005).

Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965.

MA Newton, L Clipson, AT Thliveris and RB Halberg (2006).

A statistical test of the hypothesis that polyclonal intestinal tumors ariseby random collision of initiated clones. Biometrics, 62, 721-7.

MA Newton (2006).

On estimating the polyclonal fraction in lineage marker studies of tumororigin. Biostatistics, 7, 503-14.

Page 29: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Context

AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005).

Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965.

MA Newton, L Clipson, AT Thliveris and RB Halberg (2006).

A statistical test of the hypothesis that polyclonal intestinal tumors ariseby random collision of initiated clones. Biometrics, 62, 721-7.

MA Newton (2006).

On estimating the polyclonal fraction in lineage marker studies of tumororigin. Biostatistics, 7, 503-14.

Page 30: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Context

AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005).

Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965.

MA Newton, L Clipson, AT Thliveris and RB Halberg (2006).

A statistical test of the hypothesis that polyclonal intestinal tumors ariseby random collision of initiated clones. Biometrics, 62, 721-7.

MA Newton (2006).

On estimating the polyclonal fraction in lineage marker studies of tumororigin. Biostatistics, 7, 503-14.

Page 31: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Monoclonal theory of tumor origin

genetic defectapears in a cell

Page 32: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Monoclonal theory of tumor origin

aberrant cell divides and persists

Page 33: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Aggregation chimerasprovide data on clonality.

Page 34: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

B6 Apc Min/+ Mom1 R/R <--> B6 Apc Min/+ Mom1 R/R Rosa26/+

Page 35: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

B6 Apc Min/+ Mom1 R/R <--> B6 Apc Min/+ Mom1 R/R Rosa26/+

Heterotypic tumor!

Page 36: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

mouse id % blue tissue

total # tumors

heterotypic pure blue

pure white

ambiguous

1 20 19 5 5 6 3

2 85 24 3 13 6 2

3 20 9 2 2 5 0

4 60 19 3 2 10 4

5 30 24 2 0 21 1

6 50 9 2 2 3 2

7 40 8 5 0 3 0

totals 112 22 24 54 12

Summary count data

Page 37: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

∃ many heterotypic tumors … but why?

HA : clonal cooperation - recruitment; selection

Page 38: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

∃ many heterotypic tumors … but why?

Ho : random collision

HA : clonal cooperation - recruitment; selection

Page 39: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.
Page 40: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.
Page 41: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

# initiated clones

N =

collision distance

δ =

Key parameters:

X1 = # isolated clones

X2 = # doublets

X3 = # triplets

Induced R.V.’s

# tumors (one mouse)

X1 + X2 + X3 +L

Page 42: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

# initiated clones

N =

collision distance

δ =

Key parameters:

X1 = # isolated clones

X2 = # doublets

X3 = # triplets

Induced R.V.’s

Intractable distribution!!

# tumors (one mouse)

X1 + X2 + X3 +L

Page 43: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

But thanks to Armitage, 1949,

E(X1) ≈ m1 = N exp −4ψ( )

E(X2) ≈ m2 = 2N ψ −4π + 3 3

πψ 2

⎝ ⎜

⎠ ⎟

E(X3) ≈ m3 = N4 2π + 3 3( )

⎜ ⎜

⎟ ⎟ψ 2

where

ψ =πNδ 2

4A

Page 44: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.
Page 45: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Armitage was studying dust particles … not cancer

Page 46: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

• Lineage marking

• Unknown N’s

• Extra Poisson variation

Closing the inference loop

Page 47: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Conditional predictive p-values

Page 48: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

One form of the past effect

You are dealing with a statistical problem in a special context.

You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context.

-

-

Page 49: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

John Tukey Jaraslav Hajek Peter Armitage

1915-2000 1924-present1926-1974

Page 50: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

John Tukey Jaraslav Hajek Peter Armitage

1915-2000 1924-present1926-1974

8 943

# citations of key paper

Page 51: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

John Tukey Jaraslav Hajek Peter Armitage

1915-2000 1924-present1926-1974

2800 5300415

# citations of a book

Page 52: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.
Page 53: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

“I seem to have been only like a child playing on the seashore, and diverting myself in now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me.”

- Isaac Newton

Page 54: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.

Peter Armitage 1924 - present worked with George Barnard. worked for the

Medical Research Council from 1947-61.

From 1961-76 he was Professor of Medical Statistics at the London School of Hygiene and Tropical Medicine.

moved to Oxford as Professor of Biomathematics and became Professor of Applied Statistics and head of the new Department of Statistics, retiring in 1990.

president of the Royal Statistical Society in 1982-4.

Page 55: Buried treasures Old statistics in new contexts “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton.