Variant Calling II
-
Upload
genome-reference-consortium -
Category
Science
-
view
856 -
download
1
description
Transcript of Variant Calling II
![Page 1: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/1.jpg)
Variant calling while accounting for alternate haplotypes
Aaron Quinlan University of Virginia
!quinlanlab.org
Genome Reference Consortium Workshop Sept 21, 2014
![Page 2: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/2.jpg)
Motivation: HG has long had alt loci, but we don’t handle them properly
Deanna Church and Brad Holmes
Regions w/ alternate loci (261 in total)
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/
![Page 3: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/3.jpg)
A case study: The MAPT locus (17q21.31)
doi:10.1038/nrneurol.2012.169
Disease phenotypes associated with each haplotype
H2 positively selected in Europeans, carriers predisposed to 17q21.31 microdeletion syndrome via ⬆NAHR
H1 associated with Alzheimer and Parkinson diseases
![Page 4: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/4.jpg)
A case study: The MAPT locus (17q21.31)
Scalechr17:
7 Vert. ElMultiz Align
RepeatMasker
1 Mb hg3845,500,000 46,000,000 46,500,000 47,000,000
Contigs New to GRCh38/(hg38), Not Carried Forward from GRCh37/(hg19)
GRCh38 Alignments to the Alternate Sequences/Haplotypes
GRCh38 Haplotype to Reference Sequence Mapping Correspondence
RefSeq Genes
Multiz Alignment & Conservation (7 Species)
Repeating Elements by RepeatMasker
chr17_GL000258v2_altchr17_KI270908v1_alt
chr17_GL000258v2_alt
chr17_KI270908v1_altchr17_GL000258v2_alt
KIF18BKIF18B
MIR6783C1QL1
DCAKDDCAKDDCAKDDCAKD
NMT1
PLCD3
MIR6784ACBD4
ACBD4
ACBD4ACBD4ACBD4HEXIM1HEXIM2
FMNL1
MAP3K14-AS1MAP3K14-AS1MAP3K14-AS1
SPATA32
MAP3K14-AS1MAP3K14-AS1
MAP3K14
ARHGAP27ARHGAP27
ARHGAP27PLEKHM1PLEKHM1
PLEKHM1
MIR4315-2
MIR4315-1LRRC37A4P
LOC644172
CRHR1
MGC57346MGC57346
CRHR1-IT1CRHR1-IT1
CRHR1CRHR1CRHR1CRHR1
MAPT-AS1SPPL2C
MAPTMAPTMAPTMAPTMAPTMAPTMAPTMAPT
MAPT-IT1
STH
KANSL1KANSL1KANSL1
KANSL1-AS1LOC644172
LRRC37AARL17B
ARL17AARL17B
NSFP1ARL17A
LRRC37A2ARL17AARL17AARL17A
ARL17AARL17B
NSF
NSFNSFP1
WNT3
WNT9BGOSR2
GOSR2GOSR2
MIR5089
RPRML
![Page 5: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/5.jpg)
H1 and inverted H2 don’t recombine
Extensive LD owing to suppressed recombination
between H1 and inverted H2
![Page 6: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/6.jpg)
Several SNPs distiguish H1 from H2
…
![Page 7: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/7.jpg)
Kitzman et al: NA10847 is heterozygous H1/H2
![Page 8: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/8.jpg)
Kitzman et al: NA10847 is heterozygous H1/H2N
A1
08
47
![Page 9: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/9.jpg)
How do we support variant detection with
alternate alleles using the VCF format?
![Page 10: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/10.jpg)
A G A C A C T A G T C T
H1 (chr17) H2 (GL000258v2_alt)
NA10847 is heterozygous H1/H2
![Page 11: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/11.jpg)
A G A C A C T A G T C T
H1 (chr17) H2 (GL000258v2_alt)
A G A C A CT A G T C T
NA10847 (het. H1/H2)
NA10847 is heterozygous H1/H2
![Page 12: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/12.jpg)
A G A C A C T A G T C T
H1 (chr17) H2 (GL000258v2_alt)
A G A C A CT A G T C T
NA10847 (het. H1/H2)
NA10847 is heterozygous H1/H2
A G A C A CT A G T C T
A G A C A CT A G T C T
Requires that we align reads to both loci*
* We need to be able to distinguish multiple alignments arising from ALT loci versus multiple mappings arising from segdups, repetitive elements. Else, MAPQs penalized and/or alignments not reported, depending on the behavior of the aligner. !Heng Li has started a discussion about how best to make BWA alt-aware Colin Hercus has started a discussion about how best to make Novoalign alt-aware
![Page 13: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/13.jpg)
chr17 A T 0/1 chr17 G A 0/1 chr17 A G 0/1 chr17 C T 0/1 chr17 A C 0/1 chr17 C T 0/1
GL000258v2_alt T A 0/1 GL000258v2_alt A G 0/1 GL000258v2_alt G A 0/1 GL000258v2_alt T C 0/1 GL000258v2_alt C A 0/1 GL000258v2_alt T C 0/1
VCF entries for chr17 (H1) VCF entries for alt locus (H2)
Variant calling
NA10847 is heterozygous H1/H2
![Page 14: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/14.jpg)
chr17 A T 0/1 chr17 G A 0/1 chr17 A G 0/1 chr17 C T 0/1 chr17 A C 0/1 chr17 C T 0/1
GL000258v2_alt T A 0/1 GL000258v2_alt A G 0/1 GL000258v2_alt G A 0/1 GL000258v2_alt T C 0/1 GL000258v2_alt C A 0/1 GL000258v2_alt T C 0/1
VCF entries for chr17 (H1) VCF entries for alt locus (H2)
Variant calling
NA10847 is heterozygous H1/H2
Note: Allelic relationship is
not reflected in “raw” VCF
![Page 15: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/15.jpg)
Proposal: Develop a downstream tool that leverages informative SNPs to distinguish and assign haplotypes
at alt loci via a standard VCF file.
Intermediate solution until variant callers handle this complexity natively
![Page 16: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/16.jpg)
Overview
“Raw” V/BCF file
“database” of alt loci & inform. SNPs
![Page 17: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/17.jpg)
Overview
“Raw” V/BCF file
“database” of alt loci & inform. SNPs
alt_locus main_locus GL000258.2 (H2) chr17:45309498-46836265 (H1)
infor. markers GRCh38 position H1 H2 rs241039 45637307 A T rs2049515 45684490 C T . . .
![Page 18: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/18.jpg)
Overview
“Raw” V/BCF file
“database” of alt loci & inform. SNPs
New tool(s)
alt_locus main_locus GL000258.2 (H2) chr17:45309498-46836265 (H1)
infor. markers GRCh38 position H1 H2 rs241039 45637307 A T rs2049515 45684490 C T . . .
![Page 19: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/19.jpg)
Overview
“Raw” V/BCF file
“database” of alt loci & inform. SNPs
New tool(s)
Updated V/BCF file
Haplotype predictions per indiv.
alt_locus main_locus GL000258.2 (H2) chr17:45309498-46836265 (H1)
infor. markers GRCh38 position H1 H2 rs241039 45637307 A T rs2049515 45684490 C T . . .
![Page 20: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/20.jpg)
Augment VCF with assembly information
##seq-info=<name=chr17, id=CM000679.2> !##region-info=<name=MAPT, id=GL000258.2, assoc_id=CM000679.2, reg=45309498-46836265>
Based on ideas from Deanna Church
![Page 21: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/21.jpg)
Introduce new (reserved) VCF INFO tags
##INFO=<ID=ALTLOCS, Number=., Type=String, Description=“A list of the alternate loci in the reference genome that are associated with this locus”>
##INFO=<ID=ALTHAPS, Number=., Type=String, Description=“A list of the known haplotypes that are associated with this locus”>
##FORMAT=<ID=HT,Number=1,Type=String,Description=“Haplotype combination based on ALTHAPS">
![Page 22: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/22.jpg)
(Draft) example VCF output, post “correction”
##seq-info=<name=chr17, id=CM000679.2> ##region-info=<name=MAPT,id=GL000258.2,assoc_id=CM000679.2,reg=45309498-46836265> ##INFO=<ID=ALTLOC,Number=.,Type=String,Description=“A list of the alternate loci is the reference genome that are associated with this locus”> ##INFO=<ID=ALTHAP,Number=.,Type=String,Description=“A list of the known haplotypes that are associated with this locus”> !#CHROM POS REF ALT INFO FORMAT NA10847 chr17 111 A T ALTLOCS=GL000258.2;ALTHAPs=H1,H2 GT:HT 0/1:0/1 chr17 222 G A ALTLOCS=GL000258.2;ALTHAPs=H1,H2 GT:HT 0/1:0/1 chr17 333 A G ALTLOCS=GL000258.2;ALTHAPS=H1,H2 GT:HT 0/1:0/1 . . . GL000258.2 111 A T ALTLOCS=chr17;ALTHAPs=H1,H2 GT:HT 0/1:0/1 GL000258.2 222 G A ALTLOCS=chr17;ALTHAPs=H1,H2 GT:HT 0/1:0/1 GL000258.2 333 A G ALTLOCS=chr17;ALTHAPS=H1,H2 GT:HT 0/1:0/1 . . .
![Page 23: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/23.jpg)
(Draft) example VCF output, post “correction”
##seq-info=<name=chr17, id=CM000679.2> ##region-info=<name=MAPT,id=GL000258.2,assoc_id=CM000679.2,reg=45309498-46836265> ##INFO=<ID=ALTLOC,Number=.,Type=String,Description=“A list of the alternate loci is the reference genome that are associated with this locus”> ##INFO=<ID=ALTHAP,Number=.,Type=String,Description=“A list of the known haplotypes that are associated with this locus”> !#CHROM POS REF ALT INFO FORMAT NA10847 chr17 111 A T ALTLOCS=GL000258.2;ALTHAPs=H1,H2 GT:HT 0/1:0/1 chr17 222 G A ALTLOCS=GL000258.2;ALTHAPs=H1,H2 GT:HT 0/1:0/1 chr17 333 A G ALTLOCS=GL000258.2;ALTHAPS=H1,H2 GT:HT 0/1:0/1 . . . GL000258.2 111 A T ALTLOCS=chr17;ALTHAPs=H1,H2 GT:HT 0/1:0/1 GL000258.2 222 G A ALTLOCS=chr17;ALTHAPs=H1,H2 GT:HT 0/1:0/1 GL000258.2 333 A G ALTLOCS=chr17;ALTHAPS=H1,H2 GT:HT 0/1:0/1 . . .
Solely two different haplotypes is the base case.
![Page 24: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/24.jpg)
For example NA10847 is actually H1/H2D
H2D derived from ancestral H2.
Markers that distinguish H2D from H2
![Page 25: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/25.jpg)
chr17 A T 0/1 chr17 G A 0/1 chr17 A G 0/1 chr17 C T 0/1 chr17 A C 0/1 chr17 C T 0/1 chr17 C T 0/1 chr17 C T 0/1 chr17 G A 0/1 chr17 A G 0/1 chr17 C T 0/1
GL000258v2_alt T A 0/1 GL000258v2_alt A G 0/1 GL000258v2_alt G A 0/1 GL000258v2_alt T C 0/1 GL000258v2_alt C A 0/1 GL000258v2_alt T C 0/1 GL000258v2_alt C T 0/1 GL000258v2_alt C T 0/1 GL000258v2_alt G A 0/1 GL000258v2_alt A G 0/1 GL000258v2_alt C T 0/1
VCF entries for chr17 VCF entries for alt locus
Interpretation is much harder w/ many haplotypes
H1 H2 H2D
![Page 26: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/26.jpg)
(Draft) example VCF output, post “correction”
##seq-info=<name=chr17, id=CM000679.2> ##region-info=<name=MAPT,id=GL000258.2,assoc_id=CM000679.2,reg=45309498-46836265> ##INFO=<ID=ALTLOC,Number=.,Type=String,Description=“A list of the alternate loci is the reference genome that are associated with this locus”> ##INFO=<ID=ALTHAP,Number=.,Type=String,Description=“A list of the known haplotypes that are associated with this locus”> !#CHROM POS REF ALT INFO FORMAT NA10847 NA12878 NA21599 chr17 111 A T ALTLOCS=GL000258.2;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 chr17 222 G A ALTLOCS=GL000258.2;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 chr17 333 A G ALTLOCS=GL000258.2;ALTHAPS=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 . . . GL000258.2 111 A T ALTLOCS=chr17;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 GL000258.2 222 G A ALTLOCS=chr17;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 GL000258.2 333 A G ALTLOCS=chr17;ALTHAPS=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 . . .
![Page 27: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/27.jpg)
(Draft) example VCF output, post “correction”
##seq-info=<name=chr17, id=CM000679.2> ##region-info=<name=MAPT,id=GL000258.2,assoc_id=CM000679.2,reg=45309498-46836265> ##INFO=<ID=ALTLOC,Number=.,Type=String,Description=“A list of the alternate loci is the reference genome that are associated with this locus”> ##INFO=<ID=ALTHAP,Number=.,Type=String,Description=“A list of the known haplotypes that are associated with this locus”> !#CHROM POS REF ALT INFO FORMAT NA10847 NA12878 NA21599 chr17 111 A T ALTLOCS=GL000258.2;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 chr17 222 G A ALTLOCS=GL000258.2;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 chr17 333 A G ALTLOCS=GL000258.2;ALTHAPS=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 . . . GL000258.2 111 A T ALTLOCS=chr17;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 GL000258.2 222 G A ALTLOCS=chr17;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 GL000258.2 333 A G ALTLOCS=chr17;ALTHAPS=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 . . .
NA10847 chr17 45309498 46836265 H1 H2D rs241039,rs434428,… NA12878 chr17 45309498 46836265 H1 H1 rs241039,rs434428,… NA21599 chr17 45309498 46836265 H2D H2D rs241039,rs434428,…
sample chrom start end hap1 hap2 markers
![Page 28: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/28.jpg)
(Draft) example VCF output, post “correction”
##seq-info=<name=chr17, id=CM000679.2> ##region-info=<name=MAPT,id=GL000258.2,assoc_id=CM000679.2,reg=45309498-46836265> ##INFO=<ID=ALTLOC,Number=.,Type=String,Description=“A list of the alternate loci is the reference genome that are associated with this locus”> ##INFO=<ID=ALTHAP,Number=.,Type=String,Description=“A list of the known haplotypes that are associated with this locus”> !#CHROM POS REF ALT INFO FORMAT NA10847 NA12878 NA21599 chr17 111 A T ALTLOCS=GL000258.2;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 chr17 222 G A ALTLOCS=GL000258.2;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 chr17 333 A G ALTLOCS=GL000258.2;ALTHAPS=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 . . . GL000258.2 111 A T ALTLOCS=chr17;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 GL000258.2 222 G A ALTLOCS=chr17;ALTHAPs=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 GL000258.2 333 A G ALTLOCS=chr17;ALTHAPS=H1,H2,H2D GT:HT 0/1:0/2 0/0:0/0 1/1:2/2 . . .
NA10847 chr17 45309498 46836265 H1 H2D rs241039,rs434428,… NA12878 chr17 45309498 46836265 H1 H1 rs241039,rs434428,… NA21599 chr17 45309498 46836265 H2D H2D rs241039,rs434428,…
sample chrom start end hap1 hap2 markers
![Page 29: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/29.jpg)
The good and the bad
Good
• No burden on existing variant callers to adapt to calling w.r.t. alt loci
• Tool for updating VCF can be updated and improved in parallel with variant callers.
Bad• One more step / file in the
variant interpretation pipeline
• This strategy is only applicable to cases where informative markers exist. Use CNVs in WGS: e.g., KANSL1 partial duplications to distinguish MAPT alt loci
![Page 30: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/30.jpg)
The good and the bad
Good
• No burden on existing variant callers to adapt to calling w.r.t. alt loci
• Tool for updating VCF can be updated and improved in parallel with variant callers.
Bad• One more step / file in the
variant interpretation pipeline
• This strategy is only applicable to cases where informative markers exist. Use CNVs in WGS: e.g., KANSL1 partial duplications to distinguish MAPT alt loci
To Do / Discuss• How best to improve alignment strategies to facilitate variant
and haplotype detection?
• How to best represent the resolved alternate loci in VCF format?
![Page 31: Variant Calling II](https://reader034.fdocuments.net/reader034/viewer/2022052323/558b03add8b42a9d688b467c/html5/thumbnails/31.jpg)
Many thanks for helpful discussions with:
Deanna Church Brad Holmes
Karyn Meltz-Steinberg Heng Li
Colin Hercus