Binnaz Yal ç ι n, Jan Fullerton, Sue Miller, Richard Copley, Richard Mott and Jonathan Flint
description
Transcript of Binnaz Yal ç ι n, Jan Fullerton, Sue Miller, Richard Copley, Richard Mott and Jonathan Flint
Binnaz Yalçιn, Jan Fullerton, Sue Miller, Richard Copley, Richard Mott and Jonathan FlintComplex Trait Consortium Meeting, Oxford July 1st 2003
These mice have gone off their cheese… A genetic basis for depression ?These mice have gone off their cheese… A genetic basis for depression ?
Anxiety susceptibility in the HS mice: Anxiety susceptibility in the HS mice: How far are we from discovering a QTG?How far are we from discovering a QTG?
0
1
2
3
4
5
6
7
8
9
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Distance (cM)
D1Mit264
D1Mit394
D1Imm103
D1Mit100
D1Mit423
D1Mit198D1Mit194
D1Mit102
D1Mit289
D1Mit369
-lo
g P
va
lue
95 % CI
0.8 cM
Fine-resolution mapping on mouse chromosome 1
cR
cM
Mb
FISH
74.0
Markers
143.0 144.0 145.0 146.0 147.0 148.0
D1Mit423D1Mit100
D1Mit499D1Mit395 D1Mit101 D1Mit264
D1Mit194D1Mit102
D1Mit198
15.0 17.0 16.0 18.0
73.0 73.4 73.1 73.2 73.3 73.5 73.7 73.8 73.9
Mouse chromosome 1
0.8 cM
MM
HA
P94
A 4.8-Mb high-resolution integrated BAC-based map
cR
cM
Mb
FISH
74.0
Markers
143.0 144.0 145.0 146.0 147.0 148.0
D1Mit423D1Mit100
D1Mit499D1Mit395 D1Mit101 D1Mit264
D1Mit194D1Mit102
D1Mit198
436B15
146B4
278M14
305E1
134A16
445F7
90A8
238L2
132C16
185E17
447B15
4K20
431N20
231L2
459A11
238K21
278P12
329H3
101B24
7I3
220K2
174G1
206E19
285F13
129N3
480H2
282N6
212I24
368O20
311I21
37J4
278M
14S
90A
8S
212I
24T
37J4
S
37J4
T10
1B24
S
4K20
T
231L
2S
278P
12T
Rgs
1p21
ex5
132C
16T
329H
3T
231L
2T
278M
14T
16.8
4FR
F7
15.0 17.0 16.0 18.0
73.0 73.4 73.1 73.2 73.3 73.5 73.7 73.8 73.9
Mouse chromosome 1
0.8 cM
1. Find expressed sequence tags (ESTs) using BLAST alignment
2. Compare with other species
Approaches used to identify genes
How many genes would you expect How many genes would you expect
in a 4.8-Mb regionin a 4.8-Mb region??
B302775
B830045N13
• 2 unknown ESTs (B302775 and B830045N13), respectively CDC73 and retinoic acid inducible neural specific protein homologues.
Mb
143.0 144.0 145.0 146.0 147.0 148.0
B3Galt2
• B3Galt2 (Beta 1,3-Galactosyltransferase 2).
Only 10 expressed sequences found
Glrx2
• Glrx2 (Glutaredoxin 2) also known as thioltransferase.
SSA2
• SSA2 (Sjögren Syndrome Autoantigen).
UCHL5
• UCHL5 (Ubiquitin C-Terminal Hydrolase L5).
RGS18
RGS2
RGS13RGS1
• 4 RGS genes (Regulator of G protein Signalling).
Have we missed anyHave we missed any
expressed sequencesexpressed sequences ? ?
● It contains a similar number of genes with short intergenic regions.
● It spans 365-Mb which has been sequenced to over 95 % coverage.
The Fugu genome is ideal for gene discovery in vertebrates
● 4.8 Mb were aligned to the whole Fugu genome.
Mouse-Fugu comparison
● Significant hits were identified.
● Are there any new matches that are explained by unidentified expressed sequences?
● All the hits found correspond to the genes previously identified.
● We haven't missed any coding sequence.
Are there any variants inAre there any variants in
these genesthese genes??
● We sequenced all the genes we previously identified in each of the HS founder strains and also in 12 HS mice.
● We covered coding sequences for all the genes.
● All RGS genes were fully sequenced including 4 Kb in the 5’ UTR and 2Kb in the 3’ UTR.
Identification of variants
Sequencing results for RGS2 RGS1 and RGS18
SNP del/ins repeats
RGS1822592 bp49 polymorphims100 % coverage
Structure
Coverage
Polymorphisms
Scale0 5.0 12.5 25.02.5 7.5 10.0 17.515.0 20.0 22.5
1 2 3 4 5
Coding variants
Exons
Coverage
Structure
Scale
Coverage
0 2.0 4.0 6.0
Polymorphisms
8.0
RGS27145 bp
22 polymorphisms
100 % coverage
1 2 3 4 5
RGS17368 bp96 polymorphisms100 % coverage0 2.0 5.0 10.01.0 3.0 4.0 7.06.0 8.0 9.0
Structure
Coverage
Polymorphisms
Scale
1 2 3 4 5
Summary of gene sequencing
• We sequenced 100 Kb in each of the 8 HS founders and in 12 HS mice.
• We found 296 polymorphisms.• 81% were SNPs, 13 % repeats and 6% ins/del.• Average polymorphism rate is 1 per 200 bp.• We observed segments of high (1 per 50 bp)
and low (1 per 500 bp) polymorphism rates.• All the polymorphisms found in the HS founders
are also present in the HS mice.
Symbol Length (bp) Coverage(%) Total variants 5' UTR Intronic CodingBC027756 93399 10 6 6B3Galt2 1300 100 0 0Glrx2 7150 50 8 7 1SSA2 21407 50 20 17 3UCHL5 29325 35 11 11B830045N13 387717 3 7 7RGS2 7145 100 22 13 9RGS13 45502 100 77 18 59RGS1 7368 100 96 39 53 4RGS18 22592 100 49 10 39
296 80 208 8
Coding variants identified in 10 genes
Does the variant alter protein function?
Gene exon Variant Polyphen SIFTGlrx2 2 I20I Silent SilentSSA2 2 V167A Benign TolerantSSA2 8 A461T Benign TolerantSSA2 8 V465I Benign TolerantRGS1 1 F6F Silent SilentRGS1 3 I60M Benign TolerantRGS1 4 R88K Benign TolerantRGS1 5 K186K Silent Silent
● 0.8 cM contains 4.8 Mb DNA.
Summary
● 10 genes were identified in 4.8 Mb.
● 3 genes have coding variants, none of which are predicted to alter the gene’s function.
● We cannot find any mutations that disrupt gene function.
How can we identify How can we identify
functionallyfunctionally
important non-coding variants important non-coding variants??
Mouse-human comparison
● We found over 600 conserved non-coding regions using 70% identity over 100 bp regions.
● We sequenced 20% of the conserved non-coding regions, representing 120 Kb of sequencing in each of the HS founder strains.
● Extrapolating, we predicted that there are over 1000 polymorphisms in the 4.8 Mb region.
Sequencing conserved non-coding regions
What is the arrangement of What is the arrangement of
polymorphisms across the genomespolymorphisms across the genomes
of the 8 HS foundersof the 8 HS founders ? ?
● Primers spaced on average every 5-10 Kb.● All polymorphisms detected by sequencing.● 1219 polymorphisms found including 76 %
SNPs, 14% del/ins and 10 % repeat polymorphisms.
● Average polymorphism density is 1 per 5 Kb.
Polymorphisms found in the HS founders
Examples of pairwise comparison of inbred strains
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40 45 50Physical distance
Num
ber
of v
aria
nts/
100
Kb AJ/C57
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40 45 50
BALB/C57
Physical distance
Num
ber
of v
aria
nts/
100
Kb
0
20
40
60
80
100
120
0 5 10 15 20 25 30 35 40 45 50
I/RIII
Physical distance
Num
ber
of v
aria
nts/
100
Kb
Summary of variants found
● 8 in coding regions.● 80 in 5’ UTR.● 208 in introns.● 1000 in conserved non-coding regions.● 713 in non-conserved regions.
What is the probabilityWhat is the probability
that a variantthat a variant
influences the phenotypeinfluences the phenotype ? ?
● We originally identified QTL by testing for differences between the 8 HS founder strains, allowing each strain to have a different trait value.
● But a SNP merges the founder strains into two groups.
● If the SNP is the QTN then forcing those strains within a group to have the same trait value in the statistical test will be as good.
● If the test is non-significant then we can exclude that SNP as candidate.
Assigning probabilities to variants
MOUSE RGS2 SEQUENCING INBRED STRAINS HAPPY
Region Position Type A/J AKR BALB C3H C57 DBA I RIII P-VAL
5' UTR -3045 (G)n x8 x8 x11 x8 x11 x11 x11 x11 6.46-10
5' UTR -3035 SNP G G A G A A A A 6.46-10
5' UTR -2986 (T)n x8 x8 x9 x8 x9 x9 x9 x9 6.46-10
5' UTR -2951 SNP G G A G A A A A 6.46-10
5' UTR -2854 (GTTTT)n x5 x5 x6 x5 x6 x6 x6 x6 6.46-10
5' UTR -2545 SNP G G C G C C C C 6.46-10
5' UTR -2359 (T)+ yes yes no yes no no no no 6.46-10
5' UTR -2347 (CG)+ yes no no yes no no no no 1.83-06
5' UTR -2117 SNP G G A G A A A A 6.46-10
5' UTR -1973 SNP G G T G T T T T 6.46-10
5' UTR -1888 SNP A A C A C C C C 6.46-10
5' UTR -1673 (A)n x14 x14 x20 x14 x20 x20 x20 x20 6.46-10
5' UTR -1916 SNP G G A G A A A A 6.46-10
Intron1-2 192 SNP T T C C C C C C 5.37-03
Intron1-2 241 SNP T T C T C C C C 6.46-10
Intron1-2 267 SNP C C T C T T T T 6.46-10
Intron1-2 653 (CA)n x11 x11 x24 x11 x24 x24 x24 x24 6.46-10
Intron1-2 1058 (T)n x9 x9 x10 x9 x10 x10 x10 x10 6.46-10
Intron2-3 1266 SNP G G T G T T T T 6.46-10
Intron3-4 1711 (T)n x4 x4 x5 x4 x5 x5 x5 x5 6.46-10
Intron3-4 1750 SNP T T C T C C C C 6.46-10
Intron4-5 2159 SNP A A G A G G G G 6.46-10
3' UTR 3297 SNP A A G A G G G G 6.46-10
MOUSE RGS13 SEQUENCING INBRED STRAINS HAPPY
Region Position Type A/J AKR BALB C3H C57 DBA I RIII P-VAL
5' UTR -4922 (A)n x8 x9 x10 x8 x10 x10 x10 x8 9.03-015' UTR -4697 SNP T C T T T T T T 2.95-015' UTR -4062 (A)n x4 x8 x6 x4 x6 x6 x6 x4 9.03-015' UTR -4042 (CAAA)n x5 x4 x5 x5 x5 x5 x5 x7 4.62-035' UTR -4027 (A)n x13 x8 x13 x13 x13 x13 13 x5 4.62-035' UTR -4026 (C)- no yes no no no no no no 2.95-015' UTR -3820 SNP C T C C C C C C 2.95-015' UTR -3725 SNP C G C C C C C C 2.95-015' UTR -3566 SNP A G A A A A A A 2.95-015' UTR -3374 SNP G A G G G G G G 2.95-015' UTR -3284 SNP G A G G G G G G 2.95-015' UTR -2778 SNP T T G T G G G T 3.40-015' UTR -2754 SNP G A G G G G G G 2.95-015' UTR -2665 (TAGA)n x7 x4 x4 x7 x4 x4 x4 x7 7.80-015' UTR -2524 SNP T T C T C C C T 3.40-015' UTR -2181 SNP A A T A T T T A 3.40-015' UTR -1947 SNP T C C T C C C T 7.80-015' UTR -1655 SNP C T C C C C C C 2.95-015' UTR -613 (CA)n x7 x8 x7 x7 x7 x7 x7 x7 2.95-01
HAPPY results across our whole region
Physical distance (Mb)
0
5
10
15
20
25
142.5 143 143.5 144 144.5 145 145.5 146 146.5 147 147.5
Megabase
Lo
gP
HAPPY results across our whole region
Physical distance (Mb)
0
5
10
15
20
25
142.5 143 143.5 144 144.5 145 145.5 146 146.5 147 147.5
Megabase
Lo
gP
SSA2B8
B3B3Galt2
Glrx2
UCHL5
RGS2
RGS1RGS18RGS13
Most significant SNPs lie within a conserved non-coding region
● We can exclude 77% of the SNPs identified that are not significant.
● Among coding variants none is significant.● Among 5’ UTR regions 17 are significant.● We can further exclude another 13 % which
lie under non-conserved regions.● This identifies 120 SNPs as significant.
How many variants could we exclude?
● There are no obvious coding variants that are the QTN.
● Haplotype analysis can help limit the search but involves immense amounts of sequencing.
● There may not be a single responsible variant.
● One region, 5’ of the RGS18 gene contains the most significant SNPs, within a conserved non-coding region
Conclusions
● Jonathan Flint● Richard Mott● Jan Fullerton● Sue Miller
● Andrew Morris● Richard Copley● John Broxholme
Acknowledgements
Acknowledgements