QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval...
Transcript of QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval...
![Page 1: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/1.jpg)
QTL mapping in mice
Karl W Broman
Department of BiostatisticsJohns Hopkins University
Baltimore, Maryland, USA
www.biostat.jhsph.edu/˜kbroman
Outline
� Experiments, data, and goals� Models� ANOVA at marker loci� Interval mapping� LOD scores, LOD thresholds� Mapping multiple QTLs� Simulations
![Page 2: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/2.jpg)
Backcross experiment
P1
A A
P2
B B
F1
A B
BCBC BC BC
Intercross experiment
P1
A A
P2
B B
F1
A B
F2F2 F2 F2
![Page 3: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/3.jpg)
Trait distributions
30
40
50
60
70
A B F1 BC
Strain
Phe
noty
pe
Data and Goals
Phenotypes: ��� = trait value for mouse �Genotypes: ����� = 1/0 if mouse � is BB/AB at marker �
(for a backcross)Genetic map: Locations of markers
Goals:� Identify the (or at least one) genomic regions
(QTLs) that contribute to variation in the trait.� Form confidence intervals for QTL locations.� Estimate QTL effects.
Note: QTL = “quantitative trait locus”
![Page 4: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/4.jpg)
Why?
Mice: Find gene
� � Drug targets, biochemical basis
Agronomy: Selection for improvement
Flies: Genetic architecture
� � Evolution
80
60
40
20
0
Chromosome
Loca
tion
(cM
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
Genetic map
![Page 5: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/5.jpg)
20 40 60 80 100 120
20
40
60
80
100
120
Markers
Indi
vidu
als
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
Genotype data
Statistical structure
QTL
Markers Phenotype
Covariates
The missing data problem:
Markers � � QTL
The model selection problem:
QTL, covariates � � phenotype
![Page 6: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/6.jpg)
Models: Recombination
We assume no crossover interference.
��� Points of exchange (crossovers) are accordingto a Poisson process.
��� The� ������ (marker genotypes) form a Markov
chain
Example
A B − B A A
B A A − B A
A A A A A B
?
![Page 7: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/7.jpg)
Models: Genotype � � Phenotype
Let � = phenotype� = whole genome genotype
Imagine a small number of QTLs with genotypes ��������������� .( � distinct genotypes)
E ��� ��� � ��������������� ��� var ��� ��� � �� ������������� ���
Models: Genotype � � Phenotype
Homoscedasticity (constant variance): �! �#" �$
Normally distributed residual variation: ��� �&% ' �(� � �$ � .
Additivity: �!� � ��������� � � � �*) + ��, �.- � � � ( � � � / or 0 )
Epistasis: Any deviations from additivity.
![Page 8: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/8.jpg)
The simplest method: ANOVA
� Split mice into groupsaccording to genotypeat a marker.
� Do a t-test / ANOVA.� Repeat for each marker.
40
50
60
70
80
Phe
noty
pe
BB AB
Genotype at D1M30
BB AB
Genotype at D2M99
ANOVA at marker loci
Advantages� Simple.� Easily incorporate
covariates.� Easily extended to more
complex models.� Doesn’t require a genetic
map.
Disadvantages� Must exclude individuals
with missing genotype data.� Imperfect information about
QTL location.� Suffers in low density scans.� Only considers one QTL at a
time.
![Page 9: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/9.jpg)
Interval mapping (IM)
Lander & Botstein (1989)� Take account of missing genotype data� Interpolate between markers� Maximum likelihood under a mixture model
0 20 40 60
0
2
4
6
8
Map position (cM)
LOD
sco
re
Interval mapping (IM)
Lander & Botstein (1989)� Assume a single QTL model.� Each position in the genome, one at a time, is posited as the
putative QTL.� Let � � 1/0 if the (unobserved) QTL genotype is BB/AB.
Assume � % ' ��� � � �� Given genotypes at linked markers, � % mixture of normal dist’ns
with mixing proportion��� � � / �marker data � :
QTL genotype�����BB AB
BB BB �� ����������� ������������� ������ ������������ ������BB AB �� �������� �����!� ���"�� #���������!�AB BB � � �� ���� � ���!� �� $��� � � � � �!�AB AB � � � � ���� ������ �� ���� � ���� #��� � ������ ������
![Page 10: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/10.jpg)
The normal mixtures
� � � �7 cM 13 cM
� Two markers separated by 20 cM,with the QTL closer to the leftmarker.
� The figure at right show the dis-tributions of the phenotype condi-tional on the genotypes at the twomarkers.
� The dashed curves correspond tothe components of the mixtures.
20 40 60 80 100
Phenotype
BB/BB
BB/AB
AB/BB
AB/AB
µ0
µ0
µ0
µ0
µ1
µ1
µ1
µ1
Interval mapping (continued)
Let ��� � ��� � � � / �marker data �
� � � � � % ' ����� � � ���� � � �marker data � ��� � � � � � � � � ��� � � � � � � � ) / � � � � � � � ��� � � �
where � �� � � � � � density of normal distribution
Log likelihood: � � � � � � � � � � + ������� ��� � � �marker data � ��� � � � � � �
Maximum likelihood estimates (MLEs) of � � , � � , � :
EM algorithm.
![Page 11: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/11.jpg)
LOD scores
The LOD score is a measure of the strength of evidence for thepresence of a QTL at a particular location.
LOD � � � � � � � � likelihood ratio comparing the hypothesis of aQTL at position � versus that of no QTL
� � � � / 0� ��� � �QTL at � ������ � ���� � � ���� � ���� ��� no QTL ���� ���� � �
�� � � ���� � � ���� � are the MLEs, assuming a single QTL at position � .
No QTL model: The phenotypes are independent and identicallydistributed (iid) ' � � �� � .
An example LOD curve
0 20 40 60 80 100
0
2
4
6
8
10
12
Chromosome position (cM)
LOD
![Page 12: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/12.jpg)
0 500 1000 1500 2000 2500
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Map position (cM)
lod
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19LOD curves
Interval mapping
Advantages� Takes proper account of
missing data.� Allows examination of
positions between markers.� Gives improved estimates
of QTL effects.� Provides pretty graphs.
Disadvantages� Increased computation
time.� Requires specialized
software.� Difficult to generalize.� Only considers one QTL at
a time.
![Page 13: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/13.jpg)
LOD thresholds
Large LOD scores indicate evidence for the presence of a QTL.
Q: How large is large?
� We consider the distribution of the LOD score under the nullhypothesis of no QTL.
Key point: We must make some adjustment for our examination ofmultiple putative QTL locations.
� We seek the distribution of the maximum LOD score, genome-wide. The 95th %ile of this distribution serves as a genome-wideLOD threshold.
Estimating the threshold: simulations, analytical calculations, per-mutation (randomization) tests.
Null distribution of the LOD score
� Null distribution derived bycomputer simulation of backcrosswith genome of typical size.
� Solid curve: distribution of LODscore at any one point.
� Dashed curve: distribution ofmaximum LOD score,genome-wide.
0 1 2 3 4
LOD score
![Page 14: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/14.jpg)
Permutation tests
mice
markers
genotypedata
phenotypes
� LOD � � �(a set of curves)
�� �
������ LOD � � �
� Permute/shuffle the phenotypes; keep the genotype data intact.� Calculate LOD
� � � ��� � � ������ LOD � � �
� We wish to compare the observed�
to the distribution of�
.�� �� � � �� � � is a genome-wide P-value.� The 95th %ile of
� is a genome-wide LOD threshold.
� We can’t look at all ��� possible permutations, but a random set of 1000 is feasi-ble and provides reasonable estimates of P-values and thresholds.
� Value: conditions on observed phenotypes, marker density, and pattern of miss-ing data; doesn’t rely on normality assumptions or asymptotics.
Permutation distribution
maximum LOD score
0 1 2 3 4 5 6 7
95th percentile
![Page 15: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/15.jpg)
Multiple QTL methods
Why consider multiple QTLs at once?
� Reduce residual variation.� Separate linked QTLs.� Investigate interactions between QTLs (epistasis).
Epistasis in a backcross
Additive QTLs
Interacting QTLs
0
20
40
60
80
100
Ave
. phe
noty
pe
A HQTL 1
A
H
QTL 2
0
20
40
60
80
100
Ave
. phe
noty
pe
A HQTL 1
A
H
QTL 2
![Page 16: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/16.jpg)
Epistasis in an intercross
Additive QTLs
Interacting QTLs
0
20
40
60
80
100
Ave
. phe
noty
pe
A H BQTL 1
AH
BQTL 2
0
20
40
60
80
100
Ave
. phe
noty
pe
A H BQTL 1
A
H
BQTL 2
Abstractions / simplifications
� Complete marker data
� QTLs are at the marker loci
� QTLs act additively
![Page 17: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/17.jpg)
The problem
n backcross mice; M markers
� ��� = genotype (1/0) of mouse � at marker �� � = phenotype (trait value) of mouse �
� � � � )�
� � - � ����� ) � � Which - � �� 0 ?
��� Model selection in regression
How is this problem different?
� Relationship among the x’s
� Find a good model vs. minimize prediction error
![Page 18: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/18.jpg)
Model selection
� Select class of models– Additive models
– Add’ve plus pairwise interactions
– Regression trees
� Compare models
– BIC � ��� � �������RSS ��� � ��� �"���
����� �� �
– Sequential permutation tests
– Estimate of prediction error
� Search model space– Forward selection (FS)
– Backward elimination (BE)
– FS followed by BE
– MCMC
� Assess performance– Maximize no. QTLs found;
control false positive rate
Why BIC � ?
� For a fixed no. markers, letting � � � , BIC � is consistent.
� There exists a prior (on models + coefficients) for whichBIC � is the –log posterior.
� BIC � is essentially equivalent to use of a threshold on theconditional LOD score
� It performs well.
![Page 19: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/19.jpg)
Choice of �
Smaller � : include more loci; higher false positive rate
Larger � : include fewer loci; lower false positive rate
Let L = 95% genome-wide LOD threshold(compare single-QTL models to the null model)
Choose � = 2 L / ����� �� �
With this choice of � , in the absence of QTLs, we’llinclude at least one extraneous locus, 5% of thetime.
Simulations
� Backcross with n=250� No crossover interference� 9 chr, each 100 cM� Markers at 10 cM spacing;
complete genotype data� 7 QTLs
– One pair in coupling– One pair in repulsion– Three unlinked QTLs
� Heritability = 50%� 2000 simulation replicates
1
2
3
4
5
6
7
8
9
![Page 20: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/20.jpg)
Methods
� ANOVA at marker loci� Composite interval mapping (CIM)� Forward selection with permutation tests� Forward selection with BIC �� Backward elimination with BIC �� FS followed by BE with BIC �� MCMC with BIC �
� � A selected marker is deemed correct if it is within10 cM of a QTL (i.e., correct or adjacent)
0
1
2
3
4
5
6
7
Correct
Ave
no.
cho
sen
AN
OV
A 3 5 7 9 11
CIM fs, p
erm fs be
fs/b
e
mcm
c
BIC
![Page 21: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/21.jpg)
0.0
0.5
1.0
1.5
2.0
QTLs linked in coupling
Ave
no.
cho
sen
AN
OV
A 3 5 7 9 11
CIM fs, p
erm fs be
fs/b
e
mcm
c
BIC
0.0
0.5
1.0
1.5
2.0
QTLs linked in repulsion
Ave
no.
cho
sen
AN
OV
A 3 5 7 9 11
CIM fs, p
erm fs be
fs/b
e
mcm
c
BIC
![Page 22: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/22.jpg)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Other QTLs
Ave
no.
cho
sen
AN
OV
A 3 5 7 9 11
CIM fs, p
erm fs be
fs/b
e
mcm
c
BIC
0.0
0.1
0.2
0.3
0.4
Extraneous linked
Ave
no.
cho
sen
AN
OV
A 3 5 7 9 11
CIM fs, p
erm fs be
fs/b
e
mcm
c
BIC
![Page 23: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/23.jpg)
0.00
0.01
0.02
0.03
0.04
0.05
0.06
Extraneous unlinked
Ave
no.
cho
sen
AN
OV
A 3 5 7 9 11
CIM fs, p
erm fs be
fs/b
e
mcm
c
BIC
Summary
� QTL mapping is a model selection problem.
� Key issue: the comparison of models.
� Large-scale simulations are important.
� More refined procedures do not necessarily giveimproved results.
� BIC � with forward selection followed by backwardelimination works quite well (in the case of additiveQTLs).
![Page 24: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/24.jpg)
Acknowledgements
Terry Speed, University of California, Berkeley, and WEHI
Gary Churchill, The Jackson Laboratory
Saunak Sen, University of California, San Francisco
References
� Broman KW (2001) Review of statistical methods for QTL mapping in experi-mental crosses. Lab Animal 30(7):44–52
Review for non-statisticians� Broman KW, Speed TP (1999) A review of methods for identifying QTLs in
experimental crosses. In: Seillier-Moiseiwitch F (ed) Statistics in MolecularBiology. IMS Lecture Notes—Monograph Series. Vol 33, pp. 114–142
Older, more statistical review.� Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantita-
tive traits using RFLP linkage maps. Genetics 121:185–199
The seminal paper.� Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait
mapping. Genetics 138:963–971
LOD thresholds by permutation tests.� Miller AJ (2002) Subset selection in regression, 2nd edition. Chapman & Hall,
New York.
A reasonably good book on model selection in regression.
![Page 25: QTL mapping in mice - UW–Madisonkbroman/presentations/qtl_nz... · 2003-01-14 · Interval mapping (IM) Lander & Botstein (1989) Take account of missing genotype data Interpolate](https://reader036.fdocuments.net/reader036/viewer/2022062603/5f0243fb7e708231d4036775/html5/thumbnails/25.jpg)
� Strickberger MW (1985) Genetics, 3rd edition. Macmillan, New York, chapter11.
An old but excellent general genetics textbook with a very interesting discussionof epistasis.
� Broman KW, Speed TP (2002) A model selection approach for the identificationof quantitative trait loci in experimental crosses (with discussion). J Roy StatSoc B 64:641–656, 737–775
Contains the simulation study described above.