Post on 19-Jan-2016
description
Association mapping:
finding genetic variants for common
traits & diseases
Manuel Ferreira
Queensland Institute of Medical Research
Brisbane
Genetic Epidemiology
1WEHI Postgraduate seminar, 31 May 2010
2
Predict disease risk / drug response Personalized Medicine
Lancet 2010; 375: 1525–35
Understand disease aetiology
Why?
3
Rare, monogenic traits
Ng et al. Nature Genetics 2010; 42: 30-35.
G E
O
DISEASERISK
4
Common, complex traits
Phenotypic modelling
Linkage analysis
Association analysis
GENETICS OF COMMON DISEASES
1990
2000
2005
2008
2009
2010
2015
Recent advances assays/analysis genetic
variationHapMap, 1000 Genomes
High-throughput genotyping & sequencing
Analytic Methods
Genome-wide association, imputation, stratification, CNVs, risk prediction
genes env
other
DISEASERISK
genes
6
HapMap project
“The HapMap was designed to determine the frequencies and patterns of association among roughly 3 million common Single Nucleotide Polymorphisms (SNPs) in four populations, for use in
genetic association studies.” [4]
1. GOALS
[1] The International HapMap Consortium. Nature 2003; 426: 789. [2] International HapMap Consortium. Nature 2005; 437: 1299.[3] International HapMap Consortium. Nature 2007; 449: 851.[4] Manolio et al. J Clin Invest 2008; 118: 1590.
Individuals
SNPs
7
HapMap project
2. STRATEGY
30 trios Yoruba in Ibadan, Nigeria (YRI)30 trios European descent in Utah (CEU)45 unrelated Han Chinese from Beijing (CHB)45 unrelated Japanese from Tokyo (JPT)
Genome-wide SNP discovery1,7 million dbSNP 9,2 million
2002 200514,7 million (6,5 million validated)
2009
Genotyping
Phase 1: MAF>0.05, validated, non-synonymous SNPs prioritised (1,27 million total)
Phases 2 and 3 expanded SNP (4 million) and population (11) coverage
http://www.hapmap.org/
SNP selection
7 genotyping platforms used/developed by 12 centres
8
HapMap project
3. OUTCOMES
“Systematic” catalogue of common human variation
Linkage disequilibrium (LD) or correlation between SNPs(tagging, fine-mapping, imputation)
Designing and refining high-throughput genotyping platforms
9
Population genetics (selection, sub-structure, recombination & mutation)
10
Gene A
Haplotypes
HapMap SNPs
D’ and r2
Correlation (LD) between SNPs
Haploview, TaggerSNP tags
Genetic CoverageProportion of known SNPs taggedHaploview
Fine-mappingInteresting SNPs to follow-upCross-study comparisons
eg. SNP 1 ‘tags’ 4/10 variants
11
1000 Genomes project
GOAL
http://www.1000genomes.org/
“The 1000 Genomes Project aims to achieve a nearly complete catalog of common human genetic variants (defined as frequency 1% or higher) by generating high-quality sequence
data for >85% of the genome for three sets of 400-500 individuals (...)”
2,500 samples at 4x by 2011
High-throughput genotyping & sequencing
12
Whole-genome genotyping (from $300 USD/sample)
Whole-genome sequencing (from $10,000 USD/sample)
Illumina:
HiSeq 200030x coverage
100 bp read length
Complete Genomics
40x coverage35 bp read length
Affymetrix:
6.0 chip>900,000 SNPs
CNV probes82% coverage CEU HapMap
Accuracy 99.90%
Illumina:
Human1M BeadChip>1 million SNPs
CNV probes95% coverage CEU HapMap
Accuracy 99.94%
Recent advances assays/analysis genetic
variationHapMap, 1000 Genomes
High-throughput genotyping & sequencing
Analytic Methods
Genome-wide Association, stratification, imputation, CNV, risk prediction
Examples: recent GWAS.
13
Analytic methods
1. GENOME-WIDE ASSOCIATION
14
Indi
vidu
als
SNPs
cases
controls
cases controls
No association
Association
Analytic methods
15
Association testsStudy designs
Unrelated individuals
Families
Software
Between individual effects
Between + Within family effects
Many (eg. PLINK)
Merlin, etc
Unrelated individuals
Families
More power / $ spent, easier to collect, analyse
Assess inheritance (CNVs), robust population stratification
Pros
Analytic methods
2. POPULATION STRATIFICATION
Ind1 Ind2 % shared
A1 A2 100
A1 A3 50
A1 A4 25
A1 A5 10
A1 A6 8
A1 B1 5
Genetic matching
AB
BA
16
A
B
B
A
cases controls
Analytic methods
3. IMPUTATION OF UNMEASURED GENOTYPES
Reference panel (eg. HapMap)
Genotyped Dataset
Individuals
SNPs MAF N SNPs Imputation Info score
Proportion of SNPs
Average Imputation
Rate
Average Concordance
0.01-0.05 27,078 Not imputed 0.000 - -
0-0.5 0.325 0.841 0.966
0.5-0.8 0.149 0.917 0.979
≥0.8 0.526 0.992 0.992
0.05-0.15 71,984 Not imputed 0.002 - -
0-0.5 0.164 0.525 0.934
0.5-0.8 0.175 0.750 0.961
≥0.8 0.659 0.967 0.989
0.15-0.25 65,918 Not imputed 0.004 - -
0-0.5 0.082 0.248 0.874
0.5-0.8 0.164 0.554 0.939
≥0.8 0.750 0.939 0.986
0.25-0.50 146,253 Not imputed 0.004 - -
0-0.5 0.053 0.094 0.777
0.5-0.8 0.145 0.389 0.907
≥0.8 0.798 0.917 0.981
MACH, IMPUTE, BEAGLE17
Shaun Purcell, Doug Ruderfer (PLINK)
Genotyped + Imputed Dataset
18
Affy
Illumina
Perlegen
HapMap
Combine data from studies genotyped using different platforms
Example 1: Bipolar Disorder GWAS
WTCCC STEP-UCL ED-DUB-STEP2 Overall
Sample Size
N (% males) 4,764 (45) 3,467 (47) 2,365 (40) 10,596 (44)
Cases (% males) 1,829 (38) 1,460 (43) 1,098 (44) 4,387 (41)
Controls (% males) 2,935 (49) 2,007 (50) 1,267 (36) 6,209 (47)
Genotype missing rate 0.0027 0.0057 0.0031 0.0038
MAF GRR Power (α = 5 × 10-8)
0.05 1.40 0.05 0.02 <0.01 0.61
0.20 1.20 0.03 <0.01 <0.01 0.48
0.40 1.15 0.02 <0.01 <0.01 0.31
Ferreira et al (2008) Nature Genetics 40: 105619
325,690 SNPs
>1,7 million SNPs
ANK3: Ankyrin G
Cases: 7.0% Controls: 5.3%Odds ratio = 1.45
Not related to sex, psychosis or age-of-onset
Smith et al (2009) Mol Psychiatry 14: 755-63.
Scott et al (2009) Proc Natl Acad Sci USA 106: 7501-6.
[Lee et al (2010) Mol Psychiatry Apr 13 – Han Chinese population]
20
Replicated recently
Example 2: analysis of lymphocyte subsets
Ferreira et al. (2010) Am J Hum Genet 86: 88-92 21
2,538 individuals | CD4+ T cell levels, CD8+ T cell levels, CD4:CD8 ratio
MHC class I• rs2524054, C• Increased CD8+ T levels• Improved host control of HIV (OR=0.32, P=10-9)
MHC class II• rs9270986, A• Increased CD4+ T levels• Protective effect for type-1 diabetes (OR = 0.04, P=10-125)• Protective effect Rheum. Arthritis (OR=0.60, P=10-15)
Structural Variants
Genomic alterations involving segment of DNA >1kb
Quantitative
(Copy Number Variants)
Positional (Translocations)
Orientational (Inversions)
Deletions
Duplications
Insertions
Analytic methods
4. Structural Variants
Detection of CNVs
Non-polymorphic probesMcCarroll et al 2008 Nat Genet 40: 1166
Detection of CNVs
Use polymorphic probes from genotyping arrays to Identify and genotype new, potentially rarer CNVs
Example: rs1006737 A/G ... AGCCCGAAATGTTTTCAGA...
... AGCCCGAAGTGTTTTCAGA...
probe 1
probe 2AAAGGG
Intensity of probe 2
Inte
nsity
of
prob
e 1
Detection of CNVs
1 A/G 1 1 2
2 A/- 1 0 1
3 AA/- 2 0 2
4 -/G 0 1 1
5 -/- 0 0 0
6 AAA/G 3 1 4
...CG ATG...
ATG......CGATG......CG
ATG......CG
ATG......CGATG......CG
ATG......CGATG......CG
ATG......CGATG......CG
ATG......CGATG......CG
A/G
A
AG
A
AA
G
A A AG
ATG......CG
Mat/PatIndGenotype Copy number for:
A G TotalPattern
Detection of CNVs ...CG ATG...
A/G
A
Normalized intensity of allele A
Nor
mal
ized
inte
nsity
of
alle
le G
Polymorphic probe in CNV region
A/A
A/G
G/G
Individuals with
deletion(s)
Individuals with
duplication(s)ie. total CN > 2
ie. total CN < 2
Detection of CNVs
Combine information across probes to identify new CNVs
For example... Cases Controls
100kb deletion chr. 2 10/5,000 1/5,000
Korn et al 2008 Nat Genet 40: 1253
BirdseyeAffy 5.0, 6.0
Wang et al 2007 Genome Res 17: 1665
PennCNVAffymetrix and Illumina
Example 3: Autism whole-genome CNV
analysisSample 16p11 Cases Controls P
Discovery Del (600kb) 5/1,441 3/4,2341.1 x 10-4
[Affy 500K] Dup 7/1,441 2/4,234
Replication 1 (CHB) Del 5/512 0/4340.007
[array-CGH] Dup 4/512 0/434
Replication 2 (deCODE) Del 3/299 2/18,8344.2 x 10-4
[Illumina] Dup 0/299 5/18,834
Deletion frequency Iceland
Autism 1%Psychiatric disorder 0.1%General population 0.01%
Weiss et al. N Engl J Med 2008; 358: 667
COPPERBirdseye
CNAT
del dup
inherited 2 6de novo 10 1unknown 1 4
Example 4: SCZ whole-genome CNV
analysis
Shaun Purcell
CasesCases
ControlsControlsChromosome Chromosome →→
Genome-wide burden
Specific loci
3,391 patients with SCZ, 3,181 controlsFilter for <1% MAF, >100kb
6,753 CNVs
Cases have greater rate of CNVs than controls1.15-fold increase
P = 3×10-5
Cases have greater rate of CNVs than controls1.15-fold increase
P = 3×10-5
Rate of genic CNVs in cases versus controls1.18-fold increase
P = 5×10-6
Rate of genic CNVs in cases versus controls1.18-fold increase
P = 5×10-6
Rate of non-genic CNVs in cases versus controls1.09-fold increase
P = 0.16
Rate of non-genic CNVs in cases versus controls1.09-fold increase
P = 0.16
Results invariant to obvious statistical controlsArray type, genotyping plate, sample collection site, mean probe intensity
Results invariant to obvious statistical controlsArray type, genotyping plate, sample collection site, mean probe intensity
Genome-wide burden of rare CNVs in SCZ
Shaun Purcell
Similar successes for Similar successes for other common diseasesother common diseases
31
Jan 2006 to
Jan 2008
before Jan 2006
Crohn’s Disease (31 loci, ~10% variance)
10
20
30
0
5
http://www.genome.gov/gwastudies
Altshuler, Daly & Lander. Science 2008; 322: 881Manolio, Brooks & Collins. J Clin Invest 2008 118: 1590
N c
onfir
med
loci
32
Summary
Tremendous recent technological advances
Large-scale genetic association studies feasible
>150 disease loci unequivocally identified since 2006
Provide a solid base to build our knowledge about disease mechanisms
Hundreds of loci yet to be identified for most diseases
33