Multiple Comparisons Measures of LD

19
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013

description

Multiple Comparisons Measures of LD. Jess Paulus, ScD January 29, 2013. Today’s topics. Multiple comparisons Measures of Linkage disequilibrium D’ and r 2 r 2 and power. Multiple testing & significance thresholds. Concern about multiple testing - PowerPoint PPT Presentation

Transcript of Multiple Comparisons Measures of LD

Page 1: Multiple Comparisons Measures of LD

Multiple ComparisonsMeasures of LD

Jess Paulus, ScD January 29, 2013

Page 2: Multiple Comparisons Measures of LD

Today’s topics

1. Multiple comparisons2. Measures of Linkage disequilibrium

• D’ and r2

• r2 and power

Page 3: Multiple Comparisons Measures of LD

Multiple testing & significance thresholds

Concern about multiple testing Standard thresholds (p<0.05) will lead to a

large number of “significant” results Vast majority of which are false positives

Various approaches to handling this statistically

Page 4: Multiple Comparisons Measures of LD
Page 5: Multiple Comparisons Measures of LD

Possible Errors in Statistical Inference

Unobserved Truth in the Population

Ha: SNP prevents DMH0: No

association

Observed in the Sample

Reject

H0: SNP prevents

DM

True positive (1 – β)

False positive Type I error (α)

Fail to reject H0:

No assoc.

False negativeType II error (β):

True negative (1- α)

Page 6: Multiple Comparisons Measures of LD

Probability of Errors

α = Also known as: “Level of significance” Probability of Type I error – rejecting null hypothesis when it is in fact true (false positive), typically 5%

p value = The probability of obtaining a result as extreme or more extreme than you found in your study by chance alone

Page 7: Multiple Comparisons Measures of LD

Type I Error (α) in Genetic and Molecular Research

A genome-wide association scan of 500,000 SNPs will yield:

25,000 false positives by chance alone using α = 0.05

5,000 false positives by chance alone using α = 0.01

500 false positives by chance alone using α = 0.001

Page 8: Multiple Comparisons Measures of LD

Multiple Comparisons Problem

Multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously

Type I errors are more likely to occur Several statistical techniques have been developed to

attempt to adjust for multiple comparisons Bonferroni adjustment

Page 9: Multiple Comparisons Measures of LD

Adjusting alpha

Standard Bonferroni correction Test each SNP at the α* =α /m1 level Where m1 = number of markers tested Assuming m1 = 500,000, a Bonferroni-corrected threshold

of α*= 0.05/500,000 = 1x10–7 Conservative when the tests are correlated

Permutation or simulation procedures may increase power by accounting for test correlation

Page 10: Multiple Comparisons Measures of LD

Measures of LD

Jess Paulus, ScD January 29, 2013

Page 11: Multiple Comparisons Measures of LD

Haplotype definition Haplotype: an ordered sequence of alleles at

a subset of loci along a chromosome

Moving from examining single genetic markers to sets of markers

Page 12: Multiple Comparisons Measures of LD

Measures of linkage disequilibrium

Basic data: table of haplotype frequencies

A G

a g

A G

a g

A g

A G

a g

A G

A G

a g

A G

A g

a g

A G

a g

A G

A aG 8 0 50%

g 2 6 50%

62.5% 37.5%

Page 13: Multiple Comparisons Measures of LD

D’ and r2 are most common

Both measure correlation between two loci D prime …

Ranges from 0 [no LD] to 1 [complete LD] R squared…

also ranges from 0 to 1 is correlation between alleles on the same

chromosome

Page 14: Multiple Comparisons Measures of LD

D Deviation of the observed frequency of a

haplotype from the expected is a quantity called the linkage disequilibrium (D)

If two alleles are in LD, it means D ≠ 0 If D=1, there is complete dependency between

loci

Linkage equilibrium means D=0

Page 15: Multiple Comparisons Measures of LD

A aG n11 n10 n1

g n01 n00 n0

n1 n0

Measure Formula Ref.

D’ Lewontin (1964)

2 = r2 Hill and Weir (1994)

* Levin (1953)

Edwards (1963)

Q Yule (1900)

)nn,nnmin(nnnn

1001

01100011

o101

201100011

nnnnnnnn

011

01100011

nnnnnn

0110

0011

nnnn

01100011

01100011

nnnnnnnn

Page 16: Multiple Comparisons Measures of LD

A G

a g

A G

a g

A g

A G

a g

A G

A G

a g

A G

A g

a g

A G

a g

A G

A aG 8 0 50%g 2 6 50%

62.5% 37.5%

D’ =(86 – 0x2) / (86) =1 r2 = (86 – 0x2)2 / (10688) = .6

o101

201100011

nnnnnnnn

R2 =

)nn,nnmin(nnnn

1001

01100011

D’ =

Page 17: Multiple Comparisons Measures of LD

r2 and power r2 is directly related to study power

A low r2 corresponds to a large sample size that is required to detect the LD between the markers

r2*N is the “effective sample size” If a marker M and causal gene G are in LD, then a

study with N cases and controls which measures M (but not G) will have the same power to detect an association as a study with r2*N cases and controls that directly measured G

Page 18: Multiple Comparisons Measures of LD

r2 and power Example:

N = 1000 (500 cases and 500 controls) r2 = 0.4 If you had genotyped the causal gene directly,

would only need a total N=400 (200 cases and 200 controls)

Page 19: Multiple Comparisons Measures of LD

Today’s topics

1. Multiple comparisons2. Measures of Linkage disequilibrium

• D’ and r2

• r2 and power