Post on 19-Dec-2015
Gene-gene and gene-environment
interactions
Manuel Ferreira
Massachusetts General Hospital
Harvard Medical School
Center for Human Genetic Research
Slides can be found at:
http://pngu.mgh.harvard.edu/~mferreira/
Outline
2. What is epistasis?
3. Study designs and tests to detect epistasis
4. Application to genome-wide datasets
1. G-G and G-E interactions in the context of gene mapping
1. G-G and G-E in context
chromosome 4 DNA sequenceSNP (single nucleotide polymorphism)
…GGCGGTGTTCCGGGCCATCACCATTGCGGGCCGGATCAACTGCCCTGTGTACATCACCAAGGTCATGAGCAAGAGTGCAGCCGACATCATCGCTCTGGCCAGGAAGAAAGGGCCCCTAGTTTTTGGAGAGCCCATTGCCGCCAGCCTGGGGACCGATGGCACCCATTACTGGAGCAAGAACTGGGCCAAGGCTGCGGCGTTCGTGACTTCCCCTCCCCTGAGCCCGGACCCTACCACGCCCGACTA…
Find disease-causing variation
The Human Genome
?
Gen
e ef
fect
Environmental effect
The environment modifies the effect of a gene
A gene modifies the effect of an environment
G x E interactionG x E interaction
Gene-environment interaction
S.Purcell ©
Epistasis
Gene effect
Gen
e ef
fect
Epistasis: one gene modifies the effect of another
Gene Gene ×× gene interaction gene interaction
S.Purcell ©
2. Definition(s) of epistasis
AA Aa aa
BB
Bb
bb
Epistasis or not ?
1 1 3
2 2 4
3 3 5
Definitions of epistasisBiological Statistical
Individual-levelphenomenon
Population-level phenomenon
BB Bb bb
AA
Aa
aa
S.Purcell ©
Gene RED Pigment 1
Pig
me
nt
2
?
Final pigment
Gene YELLOW
Gene RED Pigment 1
Pig
me
nt
2
Final pigment
Gene YELLOW
AA Aa aa
BB
Bb
bb
Gene RED
Gene YELLOW
Pigment 1
Pig
me
nt
2
Final pigment
X
Aa aa
BB
Bb
bb
Bateson (1909)
Gene RED
Gene YELLOW
Pigment 1
Pig
me
nt
2
Final pigment
X
AA
BB
Bb
bb
Bateson (1909)
Gene RED
Gene YELLOW
Pigment 1
Pig
me
nt
2
Final pigment
Introduced the concept of epistasis as a “masking effect”, whereby a variant or allele at one locus prevents the variant at another locus from manifesting its effect.
AA Aa aa
BB
Bb
bb
Mendelian concept, closer to biological definition of interaction between 2 molecules
Bateson (1909)
Fisher (1918)
0 2 21 3 3
1 3 3
01
1
Gene RED
Gene YELLOW
Epistasis defined as the extent to which the joint contribution of two alleles in different loci towards a phenotype deviates from that expected under a purely additive model.
AA Aa aa
BB
Bb
bb
0 2 2
0 2 21 2 2
1 2 2
AA Aa aa
0 2 2
Expected Observed
Mathematical concept, closer to statistical definition of interaction between 2 variables on a linear scale.
Dominance is defined as the extent to which the joint contribution of two alleles in the same locus towards a phenotype deviates from that expected by a purely additive model.
0
1
2
AA Aa aa AA Aa aa AA Aa aaAA Aa aa
Epistasis defined as the extent to which the joint contribution of two alleles in different loci towards a phenotype deviates from that expected under a purely additive model.
Additive Dominant Recessive
Genoty
pic
mean
Epistasis is very similar... Deviation from additivity between loci.
Within locus:
Between loci:
Locus A
Locus B
Additive
No effect
No effect
Additive
No effect No effect
No effect
No effect
No effect
bb Bb BB
bb Bb BB
BB
Bb
bb
0
1
2
3
4
AA Aa aa AA Aa aa AA Aa aa
Genoty
pic
mean
0
1
2
3
4
Bb BB
bb
Bb BB
bb
Bb BB
bb
AA Aa aa AA Aa aa AA Aa aa
BB
Bb
bb
BB
Bb
bb
BB
Bb
bb
0
1
2
3
4
0
1
2
3
4 bb Bb
BBBB BB
bb Bb
bb Bb
Locus AAdditive Dominant Recessive
Additive
Dominant
Recessive
Locus B
Between loci:Additive (ie. NO epistasis)
Locus AAdditive Dominant Recessive
Additive
Dominant
Recessive
Locus B0 1 2
1 2 3
2 3 4
0 2 2
1 3 3
2 4 4
0 0 2
1 1 3
2 2 4
0 1 2
2 3 4
2 3 4
0 2 2
1 3 3
1 3 3
0 0 2
2 2 4
2 2 4
0 1 2
0 1 2
2 3 4
0 2 2
0 2 2
1 3 3
0 0 2
0 0 2
2 2 4
AA Aa aa AA Aa aa AA Aa aa
BB
Bb
bb
BB
Bb
bb
BB
Bb
bb
1 1
1
1
2 0
1
0
0 2
0
2
Between loci:Additive (ie. NO epistasis)
0 0 0
0 1 1
0 1 1
0 0 0
0 1 2
0 2 3
0 0 0
0 0 1
0 1 2
0 0 0
0 0 0
0 0 4
2 2 4
2 4 2
4 2 2
1 1 1
1 1 1
1 1 8AA Aa aa AA Aa aa AA Aa aa
BB
Bb
bb
BB
Bb
bb
Between loci:
Non-Additive (ie. epistasis)
0 0
1 0
1 0
1
0
1
0
0
0
1 1 3
2 2 4
3 3 5
AA Aa aa
BB
Bb
bb
Epistasis or not ?
Statistical definition of epistasis is scale
dependent
Defined epistasis as a departure from an additive model across loci.
Crucial assumption: genotype effects are measured on the appropriate scale.
AA Aa aa AA Aa aa
+4 +4 +0.7 +0.4
log (x)
No departure from additivity
Significant departure from
additivity
0.00 0.00 1.10
0.69 0.69 1.39
1.10 1.10 1.61
1 1 4
2 2 5
3 3 6
log (x)
AA Aa
BB
Bb
aa
bb
pAABBpAaBB pAaBB
pAABbpAaBb pAaBb
pAAbbpAabb pAabb
Penetrances
AA Aa aa
RRAABBRRAaBB RRAaBB
RRAABbRRAaBb RRAaBb
RRAAbbRRAabb RRAabb
AA Aa aa
ORAABBORAaBB ORAaBB
ORAABbORAaBb ORAaBb
ORAAbbORAabb ORAabb
Relative Risks Odds Ratios
Disease trait
AA Aa
BB
Bb
aa
bb
μAABBμ AaBB μ AaBB
μ AABbμ AaBb μ AaBb
μ AAbbμ Aabb μ Aabb
Genotype Means
Continuous trait
Penetrance scale
Linear scale
RR scale
OR scale
Epistasis defined as departure from:
Additive model
Additive model
Multiplicative model
Multiplicative model
Genotype effects measured on:
Additive:Multiplicative
:
y = LocusA + LocusBy = LocusA × LocusB
3. Designs and methods to detect
epistasis
Study designs
Family-basedCase-ControlCase-only
More robust, fewer assumptions
More efficient, powerful
Methods
1. Regression
2. “Linkage Disequilibrium” or allelic-association
3. Transmission distortion
+ m3. (LocusA × LocusB)
Methods
y = m1.LocusA + m2.LocusB
y = (m1 + m3.LocusB).LocusA + m2.LocusB
Effect of LocusA on y is modified by LocusB
1. Regression
yContinuous trait
Linear regression
Disease trait Logistic regression
+ m3. (LocusA × Env)
Methods
y = m1.LocusA + m2.Env
y = (m1 + m3.Env).LocusA + m2.Env
Effect of LocusA on y is modified by Env
1. Regression
Methods 2. LD-based
Epistasis induces “LD” in cases, even for unlinked loci:
p(a) = 0.2
p(b) = 0.2
1 1 1
1 1 1
1 1 1
.640 .160
.160 .040
A a
B
b
B
b
.640 .160
.160 .040
~ 0
~ 0
“LD”Epistasis model
.41 .21 .02
.21 .10 .01
.03 .01 .00
AA Aa aa
.41 .21 .02
.21 .10 .01
.03 .01 .00
Case
sC
ontr
ols
BB
Bb
bb
BB
Bb
bbBB
Bb
bb
AA Aa aa
Genotype frequenci
es
“Haplotype
frequencies”
Methods 2. LD-based
BB
Bb
bb
p(a) = 0.2
p(b) = 0.2 .41 .21 .02
.21 .10 .01
.03 .01 .001 1 1
1 1 1
1 1 20
AA Aa aa
.40 .20 .03
.20 .10 .01
.03 .01 .02
.640 .160
.160 .040
AA Aa aa
A a
B
b
B
b
.630 .158
.158 .054
~ 0
~ 0.05
Case
sC
ontr
ols
Genotype frequenci
es
“Haplotype
frequencies”
“LD”Epistasis model
BB
Bb
bb
BB
Bb
bb
Epistasis induces “LD” in cases, even for unlinked loci:
Two-locus genotypes
AA (pA2) Aa (2pAqA)
BB (pB2)
Bb (2pBqB)
AABB
aa (qA2)
bb (qB2)
AaBB aaBB
AABb AaBb aaBb
AAbb Aabb aabb
Locus A: aA (pA) (qA)
Locus B: bB (pB) (qB) pB + qB = 1
pA + qA = 1
AAbb = Ab / Ab Ab
Ab
if and only if
AAbb ≠ Ab / Ab A Aif b b
(2-locus genotype) (haplotype)
Methods 2. LD-based
In the presence of Epistasis:
LD cases > 0
LD cases > LD controls
Statistics that measure the strength of association (δ)
between two loci
Case-ControlCase-only
H0: δ = 0 H0: δCases = δControls
LD (D, r2)Correlation
Cases(Scz)
Controls
Genes in 5q GABA cluster
Pamela SklarTracey PetryshenC&M Pato
Pamela SklarTracey PetryshenC&M Pato
Methods 3. Transmission distortion
AA Aa
Aa
BB probands
If the effect of locus A on disease risk is modified by Locus B:
AA Aa
Aa
AA Aa
Aa
50%
Bb probands
52%
bb probands
56%
Same applies for Env instead of Locus B
aa Aa
aa
aa Aa
Aa
AA Aa
Aa
AA Aa
AA
Subset of bb probands Subset of BB probands
→100% →0% → 0% →100%
If variants A and B are in LD (common haplotypes AB / ab)
False positive interactions (due to linkage or population stratification)
TDT requires assumption of independence between loci
Design & Methods
Case-ControlCase-only Family-based
Regression
LD-based
TDT
0
10
20
30
40
50
60
70
80
90
100
100 cases, 100 controls
200 cases, 200 controls
200 cases only
200 controls only
No interaction Interaction
0
10
20
30
40
50
60
70
80
90
100
100 cases, 100 controls
200 cases, 200 controls
200 cases only
200 controls only
0
10
20
30
40
50
60
70
80
90
100
100 cases, 100 controls
200 cases, 200 controls
200 cases only
200 controls only
Pow
er
Case-only designs offer efficient detection of epistasis
Case-only design isn’t always valid
Gene A Gene B
Gene A Gene B
stratification
1. Physical distance
2. Population substructure in case sample
LD Fast, often more powerful
Less useful for continuous traits and/or
family data
Pros Cons
Efficient, powerful Assumptions
Applicable to linked loci
Less efficient
Few methods that efficiently handle relatives
Case-Control
Case-only
Family-based
PLINK
Slow(er)Many extensions possible (GxE, covariates,
etc)
Regression
(unlinked loci, no stratification, etc)
4. Application to genome-wide
datasets
# SNPs # pairs
5 10 50 1,225 500 124,750 250,000 31,249,880,000 500,000 124,999,750,000
An “all pairs of SNPs” approach to epistasis does not scale well…
… but it is feasible! ~1 week, running PLINK using ~200 CPUs.
>3000 individuals
Multiple testing increases false positives
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40 45 50
Number of independent tests performed
P(a
tlea
st 1
fals
e po
siti
ve) per test false positive
rate 0.05
per test false positive rate0.001 = 0.05/50
Multiple testing increases false positives
# SNPs # pairs P-value needed
5 10 5e-350 1,225 4e-5500 124,750 4e-7250,000 31,249,880,000 2e-12500,000 124,999,750,000 4e-13
P-value required for experiment-wide significance must be adjusted for the number of tests performed
1
2
3
4
5
6
7
8
9
10
11
1213141516171819202122
Chromosome 13
Ch
rom
osom
es 1
to
22
Genome-wide epistasis screen in Bipolar-disorder
ABCDEFGHIJ
12345678
A 1A 2A 3A 4A 5A 6A 7A 8B 1B 2B 3B 4B 5B 6B 7B 8
…….J 6J 7J 8
A single gene-based test
80 allele-based tests
Gene-environment
Science 2003, 301: 306
Gene-environment
The Journal of Nutrition 2002, 8S: 132
Gene-Gene
Nature 2005, 436: 701
Further reading
• Cordell HJ (2002) Human Molecular Genetics 11: 2463-2468.
– a statistical review of epistasis, methods and definitions
• Clayton D & McKeigue P (2001) The Lancet, 358, 1357-60.
– a critical appraisal of GxE research
• Marchini J, Donnelly P & Cardon LR (2005) Nature Genetics, 37, 413-417
– epistasis in whole-genome association studies