Chapter 12. -Human-Genome-Project-Video--3D- Animation- -Human-Genome-Project-Video-
The Ashkenazi Genome Project
description
Transcript of The Ashkenazi Genome Project
![Page 1: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/1.jpg)
The Ashkenazi Genome Project
Shai CarmiPe’er lab, Columbia University
Joint Group MeetingNovember 2012
![Page 2: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/2.jpg)
Recent History of Ashkenazi Jews
• Mediterranean origin (?)• Ca. 1000: Small communities
in N. France, Rhineland
• Migration east
• Expansion
• ~10M today, mostly
in US and Israel
• Relative isolation
![Page 3: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/3.jpg)
Ashkenazi Jewish Genetics
Behar et al., Nature 2010.Bray et al., PNAS 2010.Guha et al., Genome Biology 2012.
300 Jewish individuals; SNP arrays
• Recently, AJ shown to be a genetically distinct group• Close to Middle-Eastern & South-European populations
Price et al., PLoS Genetics 2008.Olshen et al., BMC Genetics 2008.Need et al., Genome Biology 2009.Kopelman et al., BMC Genetics, 2009.
AJ
Atzmon et al., AJHG 2010
Jewish non-AJ
Middle-Eastern
Europeans
![Page 4: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/4.jpg)
Recent Demography & IBDIn small populations, common ancestors are likely recent.
A B
Generation
1
2
3
![Page 5: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/5.jpg)
Recent Demography & IBDIn small populations, common ancestors are likely recent.
AB
A shared segment
• For g-generation ancestor, chances of IBD , but length (M).
• IBD is highly informative on recent history!
Many long haplotypes identical-by-descent
A B
Generation
![Page 6: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/6.jpg)
Formal Inference Using IBD• Assume a population of historical size . • Total shared segments of length :
A B
AB
A shared segment
Palamara et al., AJHG 2012
IBD sharing abundant in AJ
Atzmon et al., AJHG 2012Gusev et al., MBE 2011
• Detect IBD in sample Infer history .
![Page 7: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/7.jpg)
AJ Genetic History
Expansion rate ≈34% per generation
2,300
N
t
Effective size
45,000270
4,300,000
Years ago
800
Present
Palamara et al., AJHG 2012
High potential for genetic studies!
0%
20%
40%
60%
80%
100%
0 50 100 150 200 250 300 350 400 450 500
# of Sequenced Individuals
% A
dditi
onal
Info
rmati
on P
oten
tial
WTCCC AJ_SCZ AJUK
Pow
er o
f im
puta
tion
by IB
D
![Page 8: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/8.jpg)
The Ashkenazi Genome Consortium
10 labs from NY area and Israel.Goals:• Sequence to high coverage hundreds of healthy AJ
o Use as a reference panel for o Association studieso Imputationo Clinical interpretation
o Understand AJ population historyo Understand AJ functional genetic variation
(negative/positive selection)
![Page 9: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/9.jpg)
The Ashkenazi Genome ConsortiumPhase I:
• 144 AJ personal genomes • ~60yo, healthy controls• Unrelated, PCA-validated AJ• Selected to maximize sharing with rest of cohort• Technology: Complete Genomics• Sequenced so far: ~100 genomes• Data presented: 58 genomes
Phase II:
• Hundreds of genomes (2013?)• More collaborators
![Page 10: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/10.jpg)
Quality MeasuresProperty Genome (exome)Coverage ~55x
Fraction called 96.5±0.003% (98%)Fraction with coverage > 20x 92.4±0.018% (94.9%)Concordance with SNP array 99.87±0.1%
Ti/Tv ratio 2.14±0.003 (3.05)
Ti/T
v
![Page 11: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/11.jpg)
Variant statisticsStatistic Per genome (exome)
Total SNPs 3.4M (22k)
Novel SNPs 3.7% (4%)
Het/hom ratio 1.64 (1.67)
Insertions count 223k (246)
Deletions count 237k (218)
Multi-nucleotide variant count 83k (374)
Synonymous SNPs 10525
Non-synonymous SNPs 9695
Nonsense SNPs 71
Other disrupting 241
CNV count 336
SV count 1486
MEI count 3475
![Page 12: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/12.jpg)
Comparison to Europeans
TAGCFlemish
All SNPs 3000000320000034000003600000
Het/hom1.4
1.6
Insertions Deletions MNPs0
100000
200000
(M)
(k)
Similar results in 13 CG European public genomes.
Novel SNPs (%)
0
2
4TAGCEU
Extrapolated to 100% genome
![Page 13: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/13.jpg)
Het/Hom Ratio
Het/hom1.4
1.6
• Significant in comparison to both Flemish and HapMap EU.
• Was observed in SNP arrays (Need et al., Genome Biology 2009).
• Did I not just say that AJ have more IBD?
![Page 14: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/14.jpg)
AJ EU
IBD observed
Het/Hom Ratiot
Years ago
Present
![Page 15: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/15.jpg)
Data Flow PipelineBackup 3x
CGA tools
VCFtestVariants
Fix Plink/Seq
QC
PlinkCompress, index
Phase
Distribute
![Page 16: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/16.jpg)
Quality ControlFalse positive rate assessment by runs of homozygosity:• Assume hets in high confidence roh are FP.
hets
PaternalMaternal
• High confidence rohs only (>7.5MB, no gaps).• 7 segments in 7 individuals (total 72MB).• Count het SNPs in original files.• Genome wide extrapolation: ~20,000 per genome.• ~3-5% FP rate for indels.
![Page 17: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/17.jpg)
Quality Control
Indels and MNPs
Low-quality SNPs
Multi-allelic SNPs
Half-calls
SNPs with high no-call rate
SNPs not in HWE
Monomorphic reference SNPs
Inbred individual
Remove:
FP after QC: ~5,000 per genome.
![Page 18: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/18.jpg)
Applicability to Clinical Genomics
• Variants of unknown significance– Technical false positives– True variants without health impact
All After QC Not in panel
020000400006000080000
100000120000140000
Total
All After QC Not in panel
0
100
200
300
400
500
600
Non-synonymous
Nov
el v
aria
nts p
er sa
mpl
e
Not in TAGC
Not in TAGC
![Page 19: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/19.jpg)
Phasing• Sequencing is in mate-pairs
• Haplotype information available for ~30-35% of hets.• BEAGLE error rate: 3-4%.• Seqphase: new phasing tool– Based on SHAPEIT– Incorporates reads– 18 hours on chromosome 1.
Distance between phased hets
100 300 500Fr
eque
ncy
![Page 20: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/20.jpg)
Variant Discovery• Number of non-reference variants.
• Extrapolation using Gravel et al., PNAS 2011.
![Page 21: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/21.jpg)
Variant Discovery• Number of segregating sites Sn(t), heterozygosity H(t).• Zivkovic and Stephan, Theor. Pop. Biol. 2011.
• N(t): # diploids at time t; N=N(t=0); ρ(t)=N(t)/N; n: # diploid samples• t: #generations/2N; θ=4Nμ; μ: mutation rate per generation
• Use double expansion model of Palamara et al., AJHG 2012.• Define t=0 at the start of the first expansion.• Match H(t).
![Page 22: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/22.jpg)
Variant Discovery
?
![Page 23: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/23.jpg)
Allele Frequency Spectrum
All
Pop.-specific
Counts Fractions
![Page 24: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/24.jpg)
Demographic Inference• Folded allele frequency spectrum + coalescent simulations.• Double expansion model + ancient AJ foundation bottleneck.• Find maximum likelihood solution (Gutenkunst et al., PLoS Genet. 2009)
– Average over simulations to obtain expected spectrum.– Assume mutation frequency is drawn according to expected spectrum.– Multinomial probability approximated as Poisson.
100
10
1
0.1
%sit
es
![Page 25: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/25.jpg)
Demographic Inference
• Similar to Palamara et al., with somewhat larger population sizes.
• To do: Gene flow from EU; better inference tools.
Years ago
3,000
N
t
Effective size
90,000
500
7,500,000
875
Present
5000
![Page 26: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/26.jpg)
Ongoing Analysis• Exome analysis
– Genes w/ AJ-specific high mutation load
• Mobile elements insertion– Common insertions frequencies
correlated with 1KG
• AJ disease genes (Ostrer & Skorecki, Human Genetics 2012)– Some carriers detected– 276 non-synonymous mutations,
>65 known– 60 loss-of-function
0 2 4 6 8 10 12 14 16 180
10
20
30Average missense allele load
TAGC
CEU
0 0.2 0.4 0.6 0.8 10
0.5
1R² = 0.746069558135442
MEI frequency
1000Genomes
TAGC
![Page 27: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/27.jpg)
Summary• AJ bottleneck and expansion reveal potential for
genetics studies.• High quality genomes sequenced by TAGC indicate
utility in clinical setting.• Complete variant discovery improves demographic
inference; subtle differences from Europeans.
• Future directions:– Imputation power using TAGC vs. 1000Genomes– Local ancestry inference– Effect of natural selection
![Page 28: The Ashkenazi Genome Project](https://reader034.fdocuments.net/reader034/viewer/2022042616/56815fce550346895dcecca6/html5/thumbnails/28.jpg)
Thank you!TAGC consortium members:Columbia University Computer Science:Itsik Pe’er, Pier Francesco PalamaraUndergrads: Fillan Grady, Ethan Kochav, James XueIT: Shlomo HershkopLong-Island Jewish Medical Center:Todd Lencz, Semanti Mukherjee, Saurav GuhaColumbia University Medical Center:Lorraine Clark, Xinmin LiuAlbert Einstein College of Medicine:Gil Atzmon, Harry OstrerMount Sinai School of Medicine:Inga Peter, Laurie OzeliusMemorial Sloan Kettering Cancer Center:Ken Offit, Vijai JosephYale School of Medicine:Judy Cho, Ken Hui, Monica BowenThe Hebrew University of Jerusalem:Ariel Darvasi
Funding:Human Frontiers Science program.
VIB, Gent, BelgiumHerwig Van Marck, Stephane PlaisanceComplete GenomicsJason Laramie