Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human...

69
Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine

Transcript of Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human...

Page 1: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Study Design in Human Genetics and Genomics

Gary Beecham, PhDJohn P. Hussman Institute for Human

GenomicsUniversity of Miami, Miller School of Medicine

Page 2: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction

Primary Purpose of Genomics:• Discover mechanisms underlying disease, to

predict, to prevent, and to treat human disease

Page 3: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction: Central Dogma• DNA

– Transcription

• Messenger RNA (mRNA)– Translation

• Proteins

Page 4: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction: Central DogmaPre-Transcription• Structural

(chromatin)• Methylation• Small regulatory

RNAsPost-Transcription• Splicing• polyA, capping• RNA

degredation

Page 5: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction

Purpose:• Discover mechanisms underlying disease, to

predict, to prevent, and to treat human diseaseHypothesis• DNA, RNA, or other regulatory changes (e.g.,

miRNA, epigenetic factors) lead to altered proteins, altered abundance of proteins, or altered regulation of proteins, thereby influencing disease

Page 6: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction

We are focusing on DNA:• DNA is the

building blocks• Inherited• Cheaper & easier

to assess

• DNA is the primary focus for much research

Page 7: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction: DNA variationACCCTTGAAAAGCTGATGAAGGCATTCGAG

ACCCTTGAAAAGCGGATGAAGGCATTCGAG

ACCCTTGAAAAGC-GATGAAGGCATTCGAG

ACCCTTGAAAAGCTAGATGAAGGCATTCGAG

ACCCTTGAAAAGCTGATGATGAAGGCATTCG

SNP/SNV: single nucleotide polymorphism/variant

Deletion

Insertion -- “indel”

CNV/SV: copy number variant, structural variant

Specific “types” at specific loci are known as ALLELES; invariant loci are said to be monomorphic

Page 8: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Chromosome

Introduction: DNA variation

haplotypes

genotype 1

Disease variant/QTL(quantitative trait locus)

SNPs/markers

genotype 2

allele 1

allele 2

Paternal

Maternal

Paternal

Maternal

Paternal

Maternal

Page 9: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction: Refining the Hypothesis

Hypotheses• DNA, RNA, or other regulatory changes (e.g., miRNA,

epigenetic factors) lead to altered proteins, altered abundance of proteins, or altered regulation of proteins, thereby influencing disease

• Certain ALLELES on particular HAPLOTYPES/CHROMOSOMES lead to altered proteins, altered abundance of proteins, or altered regulation of proteins, thereby influencing disease

• Certain ALLELES/GENOTYPES are more common among those with disease than those without

Page 10: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Introduction

Primary Research Questions• Are some genomic regions linked with disease

phenotypes in families?• Are some alleles associated with disease

phenotypes?

Linkage and Association

Page 11: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage vs Association

• Linkage analysis: co-segregation of a region/locus with disease through families– qualitative and quantitative traits– big or small families

• Association analysis: correlation of alleles with disease across populations/families– qualitative and quantitative traits– populations or small families

Page 12: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage vs Association

12 34

14 13 14

12 34

24 23 24

12 34

13 14 13

12 34

14 13 14

LINKAGE: Same marker, different allele (region)

LINKAGE + ASSOCIATION: Same marker, same allele (haplotype)

14 11 1214

24 22 34 34

Page 13: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations

• Disease Type:• Mechanism Hypothesis:• Sample Collection:• Scope:• Specificity:

Simple vs ComplexCommon vs Rare VariationFamilies vs PopulationGenome vs CandidateHigh vs Low

Page 14: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Disease Type

SIMPLE DISEASE: • Typically very rare, very severe, with “Mendelian” inheritance

patterns; often “on” vs “off” diseases, syndromes, often earlier onset

• Typically ONE or very few genetic causes of disease (simple etiology)

• Examples– Autosomal Recessive: cystic fibrosis, sickle cell disease Tay-Sachs,

phenylketonuria– Autosomal Dominant: Huntington disease, achondroplasia, familial

hypercholesterolaemia– Sex-linked: Fragile X, haemophilia A, Duchenne muscular dystrophy

Page 15: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Disease Type

COMPLEX DISEASE: • Typically more common, less “severe”, with complex (or no

clear) inheritance pattern. Less “on” vs “off”; complex and/or progressive disease course; often later or varied onset

• Generally POLYGENIC with important environmental and interaction effects (e.g., complex etiology)

• Examples– Late-onset Alzheimer disease– Parkinson disease– Cardiovascular disease– Dislipidemia– Multiple sclerosis

Page 16: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Mechanism

Common Disease – Common Variant (CDCV)

• Most influences on common complex diseases are due to common polymorphisms (> 1% allele frequency)

• Basis for use of association methods to find complex disease trait loci; linkage will not work

Common Disease-Rare Variant (CDRV)

• Common diseases are caused by mixture of common and rare alleles

• Some gene associations might reflect aggregates of rare alleles

• Linkage and association both work and don’t work, in different ways

Bodmer & Bonilla, Nature Genetics 2008;40:695-701

Rare Disease – Rare Variant (RDRV)

• Most influences on rare, simple diseases are due to rare variants of strong effect

• Linkage will work; association may or may not depending on disease and allele frequencies and sample size

Page 17: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Sample CollectionsSINGLE AFFECTED

RELATIVE PAIRS

EXTENDED FAMILIES

Case-ControlTrios Case Only

Sibpair Twins Avuncular

asso

ciatio

nlin

kage a

nd/o

r asso

ciatio

n

Page 18: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Scope

Genome-Wide• Look everywhere

– “unbiased”

• Millions of data points per person

• Millions of tests– Need larger samples or

stronger effect sizes to detect

• Generally more expensive (but more data per $)

Candidate Region• Focus on one place

– “targeted”– Biological or locational

candidate

• Fewer tests– Smaller samples,

smaller effect sizes– Larger chance of

missing an effect

• Generally less expensive (less data per $)

Page 19: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Specificity

• Linkage methods: low specificity; that is, they typically identify very broad regions of the genome

• Association methods: high specificity; they typically identify very narrow regions of the genome, relative to linkage.

Page 20: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Specificity

Beecham et al., 2015, Neurology

Page 21: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage Disequilibrium

• Linkage Equilibrium: alleles at different loci are inherited independently

• Linkage Disequilbrium: alleles at different loci are not independent and inheritance at one locus is correlated with the other

Page 22: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage Disequilibrium• The “A” allele (locus 1) tends to be

inherited with the “B” allele (locus 2)

• The event “gamete carries A” is not independent of the event “gamete carries B”

• Locus 1 and 2 are not independent; they are in linkage disequilibrium

• The “A” allele is not preferentially inherited with “Z” or “z”

• The event “gamete carries “A” is independent of the event “gamete carries Z”

• Locus 1 and 3 are independent; they are in linkage equilibrium

AB

ABA

B

ab

ab

ab

AB

AZA

z

Az

aza

Z

az

AZ

1

2

3

Page 23: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage Disequilibrium: New Mutation

wB

wBw

B

wb

wb

wb

DB

wZw

z

wz

wzw

Z

wz

DZ

1

2

3

DB

wBD

B

wb

wb

wb

DB

wZD

z

wz

Dzw

Z

wz

DZ

Page 24: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage Disequilibrium

• All new mutations are in complete LD with everything on the initial chromosome

• So, why aren’t entire chromosomes linked to themselves?

• How does linkage disequilibrium decay over time?

• RECOMBINATION

Page 25: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Decay of LDAncestral

haplotypes

A

C

G

G

MUTATIONA

C

G

G

C T

After mutation

event

Locus 1 Locus 2

RECOMBINATION

A G

C G

C T

A T

Recombinant haplotype; incomplete LD

A G

C G

C T

Complete LD between C and T

Page 26: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Decay of LD• In a large, random-mating population, in the absence of mutation,

migration and selection:

• Dt = Do (1 - )t

• Dt= disequilibrium coefficient after t generations

• Do= disequilibrium coefficient in initial generation

• = recombination fraction

0 1 2 3 4 5 6 7 8 9 10

=0q=0.01q=0.1q=0.5q

t

θ

1.0

0.8

0.6

0.4

0.2

0.0

Page 27: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

What does this have to do with association and linkage?

• Linkage and Association BOTH rely on this idea of LD decay

• Linkage relies on the decay of chromosomal haplotypes within families due to meioses between relatives

• Association relies on the decay of haplotypes across individuals due to the cumulative meioses throughout the entire population

• MARKER loci are tested; positive tests indicate a disease locus is “near” the marker locus.

Page 28: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Design Considerations: Specificity

Beecham et al., 2015, Neurology

L I N K A G E ASSOCIATION

Page 29: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

SINGLE AFFECTED

RELATIVE PAIRS

EXTENDED FAMILIES

Case-ControlTrios Case Only

Sibpair Twins Avuncular

asso

ciatio

nlin

kage a

nd/o

r asso

ciatio

n

Page 30: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

SINGLE AFFECTED

RELATIVE PAIRS

EXTENDED FAMILIES

Case-ControlTrios Case Only

Sibpair Twins Avuncular

asso

ciatio

nlin

kage a

nd/o

r asso

ciatio

n

Linkage Analyses

Page 31: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Types of Linkage Analysis

• Parametric – LOD score – genetic model is specified– Inheritance pattern (e.g., dominant, recessive, additive, etc)– Penetrance (e.g., strength of genetic effect)– Disease allele frequency

• Non-parametric – affecteds only; e.g., affected sibling pair, affected relative pair – genetic model is not specified– Analyzes allele sharing in affected relatives– Unaffected relatives usually only used to establish phase of

alleles

Page 32: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Parametric Linkage: LOD scores

Lod Score:

Z > 3 ~ significant evidence FOR linkageZ< -2 ~ significant evidence FOR non-linkage-2 < Z < 3 ~insufficient evidence for either linkage or non-

Page 33: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Parametric Linkage

• That’s the easy part!– The Hard Part: determining recombinants, non-

recombinants, determining phase, dealing with unknown phase, multipoint vs two-point, determining models, etc

• For more detail: http://hihg.med.miami.edu/educational-programs/online-genetics-courses/

• Software: MERLIN (Abecasis)

Page 34: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Nonparametric Linkage Analysis

• If the same disease gene causes disease in both members of a relative pair, then the relatives should have inherited the same alleles of genetic markers near that gene more often than would be expected by chance alone (Penrose 1935, Suarez et al. 1978).

• This approach to linkage analysis makes no assumption about the inheritance pattern.

• Ignores unaffected family members

• ASP (Affected Sib-Pair)• ARP (Affected Relative-Pair)

Page 35: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

IBD versus IBS

• Identical by Descent (IBD) sharing– relative pairs have inherited the same allele from a common

ancestor– we can trace that allele from a common ancestor down the

family tree to the descendants

• Identical by State (IBS) sharing– pairs share the same allele TYPE regardless of ancestral origin– unrelated people can share alleles IBS

Page 36: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

IBD and IBS

We infer IBD status using IBS and relationship information.

– The parents share no alleles IBD, one allele IBS.– The daughters share two alleles IBD (and IBS).– If the parents were not genotyped, the daughters IBD state could be

0, 1, or 2.

Page 37: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Inferring IBD: Parental Genotypes Known

Page 38: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Inferring IBD: Parental Genotypes Unknown

• Calculate the probability of all possible parental genotype mating types.– 11 x 23 → Pr(IBD =0) = ½, Pr(IBD =1) = ½– 12 x 13 → Pr(IBD =0) = 1– 1_ x 23 → Pr(IBD =1) = 1, where _ denotes any

alleles other than allele 1

• Add up all probabilities for IBD = 0 and IBD = 1

• Different allele frequencies will result in different probabilities

• Adding additional siblings can reduce uncertainty

Page 39: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Affected Sibling Pair (ASP) linkage tests

• Determine IBD sharing for each sibling pair.• Tests:

– Chi-squared goodness of fit test: examine deviations of observed from expected IBD distribution

– Means test: compare the mean observed number of alleles shared IBD to the expected number (i.e. 1)

– Two allele test: compare the observed number of pairs sharing 2 alleles IBD with that expected

• Generally the means test is the most powerful

Page 40: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

ASP tests• ASPs may be easier to collect

than large extended pedigrees, especially for late onset disorders

• Has reasonable power in the presence of genetic heterogeneity, provided that at least one gene has a detectable effect

• Uses only affected individuals, thus non-penetrant gene carriers do not reduce power

• Most tests require that IBD status be known. Pairs in which IBD status is unknown cannot be used.– Requires parents or enough

siblings so that parental genotypes can be inferred unambiguously

• May require large number of affected sibpairs to achieve reasonable power.

ADVANTAGES DISADVANTAGES

Page 41: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Affected Relative Pair Analysis • Like affected sibpair analysis, it does not require assumptions about:

– Mode of inheritance– Disease allele frequency– Penetrance

• Unlike affected sibpair analysis, it uses all affected relatives regardless of relationship– Not restricted to affected sibpairs– Extracts more information from extended pedigrees

• Prefer to use all data possible• Common approach: NPL statistic (Kruglyak et al, 1996) and

extensions• MERLIN (Abecasis et al.) a frequently used implementation

Page 43: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

SINGLE AFFECTED

RELATIVE PAIRS

EXTENDED FAMILIES

Case-ControlTrios Case Only

Sibpair Twins Avuncular

asso

ciatio

nlin

kage a

nd/o

r asso

ciatio

n

Page 44: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

SINGLE AFFECTED

RELATIVE PAIRS

EXTENDED FAMILIES

Case-ControlTrios Case Only

Sibpair Twins Avuncular

asso

ciatio

nlin

kage a

nd/o

r asso

ciatio

nAssociation Analyses

Page 45: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Direct and Indirect allelic association

Genotyped MARKER allele

Causal allele

Correlation (LD)

• Direct approach requires some prior knowledge of the variant

• Indirect approach is agnostic with relation to functional relevance

PhenotypeIndirect association

Haplotype

Ref: Balding. Nature Reviews. Vol 7. Oct 2006

Direct association

Page 46: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Why test for association and not linkage?

• Resolution of mapping– Meioses within families are limited. Linkage analysis generally

identifies large regions – Association analysis takes advantage of historical meioses. Better

suited for fine-mapping• Cost

– High-throughput genotyping makes genome-wide association studies (GWAS) competitive to linkage screens

• Association analysis is a well suited approach for investigating the common disease common variant (CDCV) hypothesis, through indirect association

• Direct association is becoming increasingly more feasible with cheaper next-generation sequencing technologies (CDRV), but sample size is still an issue

Page 47: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage DisequilibriumAncestral

haplotypes

A

C

G

G

MUTATIONA

C

G

G

C T

After mutation

event

Locus 1 Locus 2

RECOMBINATION

A G

C G

C T

A T

Recombinant haplotype; incomplete LD

A G

C G

C T

Complete LD between C and T

Page 48: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Linkage disequilibrium in populationsAncestral chromosome

Sampling of different chromosomes at the present day

Tests at marker (tag) SNPS on the ancestral chromosome become proxy tests for disease loci on the ancestral chromosome

(indirect association)

Page 49: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Molecular Psychiatry (2005) 10, 328

Page 50: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Case-control tests : single SNP• Samples of unrelated individuals

– Cases=Affected– Controls=Unaffected

• Observe genotypes at a locus (loci)– M1M1 (11: reference homozygote)

– M1M2 (12: heterozygote)

– M2M2 (22: non-reference homozygote)

• Two general categories of test statistics– Chi-squared Tests – Logistic Regression

Page 51: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Case-control tests: Chi-squared tests

• Allele-based• Ho:p1|case=p1|control

• Tends to have good power under a variety of genetic models

• Requires Hardy-Weinberg Equilibrium

n

1A n 1U

n

M n

n

M 2

n

n

1

2 2A 2U n

1

Cases Controls

A U N

Tcc = i = 1

2 (niA – niU)2

niA + niU

Tcc ~ Chi-squared with 1 df

Page 52: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

• When the numbers of cases and controls are not equal (i.e., nA≠ nU)

Case-control tests: Chi-squared tests

Case-control tests: Chi-squared tests

n

1A n 1U

n

M n

n

M 2

n

n

1

2 2A 2U n

1

Cases Controls

A U N

M

M

1

2

Cases Controls

n

n1

n

n2

A U N

nAn1

NnU

n1

N

nAn2

NnU

n2

N

cellsExp

(Obs – Exp)2

Tcc =

Observed Counts Expected Counts

Tcc ~ Chi-squared with 1 df

Page 53: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

• Example: Testing for association of APOE-4 and late-onset Alzheimer’s disease

360

60

Cases Controls

e4

Not e4

240

340

600 400

300

700 420

120

Cases Controls

e4

Not e4

180

280

600 400

300

700

Observed Counts Expected Counts

(240 - 180)2 (60 - 120)2 (360 - 420)2 (340 - 280)2

180 120 420 280

p-value = 10-16

+ + +Tcc =

= 71.5

Case-Control Test Cont.Case-Control Test Cont.

Page 54: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Case-control tests: Chi-squared tests

• Genotype-basedHo:p11|case=p11|control and p12|case=p12|control and p22|case=p22|control

– 2 df test– Can also test a specific genotype– Does not require Hardy-Weinberg Equilibrium

• Armitage’s Trend Test (Cochran-Armitage)– Restricted alternative hypothesis assumes additive

effect on 12 genotype (between 11 and 22).

Page 55: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Case-control tests: Logistic Regression

• Models log odds of being affected given risk factors (e.g., genotypes)– Full genotype model

– Ho:b1=b2=0

iii

i XXp

p221101

ln

Term for 11 vs 22

Term for 12 vs 22

pi=probability that ith individual is affectedX1i=1 if genotype 11 and 0 otherwise; X2i=1 if genotype 12 and 0 otherwise

Page 56: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Case-control tests: Logistic Regression

• Linear model

– Ho:b1=0

– HA:b1≠0

• Can incorporate covariates, environmental factors and test interactions

ii

i Xp

p1101

ln

Term for linear genotype effect

X1i= count of allele 1

Page 57: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Estimation of Odds Ratios

• Can estimate effects, not just test them!• In general genotype model

: estimates the ratio of the odds of being affected given genotype 11 relative to the odds of being affected given genotype 22

: estimates the ratio of the odds of being affected given genotype 12 relative to the odds of being affected given genotype 22

• Confidence intervals are computed in standard statistical packages

e 1̂

e 2̂

Page 58: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Quantitative Trait Tests

• Linear model

– Ho:b1=0

– HA:b1≠0

ii Xy 110

Term for linear genotype effect

X1i= count of allele 1

Page 59: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Family Designs

• Regression-based Tests: covariates in the model account for correlations within families (e.g., between family members)– Generalized Estimating Equations (GEE)– Linear/Logistic Mixed Models

• Transmission-based tests (specific to genomics)

Page 60: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Within-Family Tests of Association• Parental Controls

– “transmitted”– “nontransmitted”

• Cases and “controls” are well matched (e.g., population substructure is not an issue)

M1M2

M1M2

M1M1Trans M1M2

Nontrans M1M1

Page 61: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Transmission Disequilibrium Test (TDT) • As a test for linkage, can use singleton families

affected sib families and extended families

• H0: No difference in frequency between trans/nontrans

• Comparable to 2 with 1 df (for large samples)

tran

snontrans

n21

n12

M1 M2

M1

M2

M1M2

M1M2 M1M1

M1M1

TDT =(n12-n21)2

n12+n21

Page 62: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Association in the Presence of Linkage

• TDT is a test of both linkage AND association; has problems with missing parental data

• Test for Association in the Presence of Linkage (APL) (Martin et al 2003; Chung et al 2006)– Based on difference between counts of transmitted and

nontransmitted alleles– With missing parents APL estimates expected count in

nontransmitted alleles using EM algorithm

Page 63: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Inferring Missing Parental Genotypes

• Test for Association in the Presence of Linkage (APL) (Martin et al 2003; Chung et al 2006)– Based on difference between counts of transmitted and

nontransmitted alleles– With missing parents APL estimates expected count in

nontransmitted alleles using EM algorithm

Page 64: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Results

Interesting

HLA

IMSGC, N Engl J Med. 2007 Aug 30;357(9):851-62

Page 65: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.
Page 66: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Association Software

• Genetics Specific: PLINK (Purcell et al)• Standard Statistical packages

– R statistical programming language– SAS

• The programming challenge is handling/looping through thousands of markers

Page 67: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Other Designs

• Longitudinal Studies (prospective cohorts)– Very very expensive (-)– Fewer biases in estimates (+)

• Founder or Isolated Populations– Often less genetic heterogeneity (+)– Less likely to be able to apply results to other populations (-)

• Imputation Analyses– Infer genotypes at unobserved loci (+)– Infer genotypes at unobserved loci (-)

• Meta-analyses– Combine results across datasets (++++)– Virtually all major genetics consortia use these methods

Page 68: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Other Analyses

Secondary Research Questions• Do alleles interact with other alleles to influence

phenotypes? (gene x gene interaction)• Do alleles interact with environmental factors? (gene x

environment interaction)• Are sets of alleles associated with disease? (pathway

analyses, gene-based tests)• Do certain alleles predict disease? (risk prediction, risk

score analyses)• Do certain alleles influence mRNA abundance (expression-

QTL analyses)

Page 69: Study Design in Human Genetics and Genomics Gary Beecham, PhD John P. Hussman Institute for Human Genomics University of Miami, Miller School of Medicine.

Thanks!Any questions?