Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant...
Transcript of Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant...
![Page 1: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/1.jpg)
VariantCallingCHRISFIELDS
MAYO-ILLINOISCOMPUTATIONAL GENOMICSWORKSHOP,JUNE19, 2017
![Page 2: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/2.jpg)
Up-frontacknowledgmentsManyfigures/slidescomefrom:◦ GATKWorkshopslides:http://www.broadinstitute.org/gatk/guide/events?id=2038
◦ IGVWorkshopslides:http://lanyrd.com/2013/vizbi/scdttf/◦ DenisBauer(CSIRO):http://www.allpower.de/◦ Manyvariedpublications
![Page 3: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/3.jpg)
BackgroundVariantcallingandusecases
Errorsvs.actualvariants
Experimentaldesign(GATKfocus)
Smallvariant(SNV/SmallIndel)analysis◦ GATKPipeline◦ Formatsencounteredwithin
StructuralVariationAnalysis(SV)
Associationanalysis(briefly)
![Page 4: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/4.jpg)
VariantCallingAsthenameimplies,we’relookingfordifferences(variations)◦ Reference– referencegenome(hg19,b37)◦ Sample(s)– oneormorecomparativesamples
Startwithrawsequencedata
Endwithahuman‘diff’file,recordingthevariants
Additionalinformationaddeddownstream:◦ Filters(qualityofthecalls)◦ Functionalannotation
![Page 5: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/5.jpg)
VariationsDifferencebetween2individuals:1every1000bp◦ ~2.7milliondifferences
Small(<50bp)◦ SNV– singlenucleotide◦ Smallinsertionsordeletions(‘indels’)
Large(structuralvariations)◦ Indels >50bp◦ CopyNumberVariations◦ Inversions◦ Translocations
![Page 6: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/6.jpg)
VariationsMainlyfocusondiploidorganisms
◦ Human:◦ 22pairsofautosomalchromosomes
◦ Onefrommother,onefromfather◦ 2sexchromosomes(femaleXX,maleXY)
◦ Onefrommother,onefromfather(wheredoesYcomefromformaleoffspring)◦ Mitochondrialgenome(generallymaternallyinherited)
◦ 100-10,000copiespercell
Variationcanbein◦ Onechromosome(heterozygous,or‘het’)◦ Bothchromosomes(homozygous,or‘hom’)
![Page 7: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/7.jpg)
UsecasesMedicine
• Hereditaryorgeneticdiseases,geneticpredispositiontodisease• Normalvs.tumoranalyses• Heteroplasmy
Populationgenetics
![Page 8: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/8.jpg)
Populationgenetics
The1000GenomesProject
CEU
JPTCHB
YRI
LWK
MXL
ASW
GBRFIN
TSI
CHS
CLM
PUR
1,100 samples early 2011; 2,500 samples 2011/12
IBSCDX
KHVGWD
ACB
AJM
PEL
PJL
MAB
ADHKAKRDHMRM
The full 1000 Genomes Project data
![Page 9: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/9.jpg)
Cancer
NGS studies of cancer progressionThis approach was published recently in a studydesigned to compare the tumor genomes of patients withde novo AML to their relapse genomes [21]. After sequen-cing each genome (de novo tumor and relapse tumor) andthe matched normal from skin for each patient, somaticmutations and structural variants were identified. Someof these appeared to be unique to the relapse sample ineach case. We then obtained high sequencing readdepth at each somatic mutation site in the de novo andrelapse tumors, and characterized the reads that con-tained the mutated base(s) at each site to calculate anallele frequency of that variant in the tumor cell popu-lation. Using kernel density estimation, we then ident-ified groups of mutations present at the same allelefrequencies, indicative of their prevalence in the tumorcell population. This comparison of allele frequencygroups between de novo and relapse disease allowed usto model the relative numbers of tumor subclones ateach disease presentation, and defined AML progressionas a clonal process, as illustrated in Figure 2. Namely,all subclones originate from a founder clone that sharesall but the newest mutations, and relapse disease
shares mutations with the founder clone as well as newmutations that portend its proliferative advantage in therelapse presentation.
In a similar study, with a slightly different experimentaldesign, we recently explored the differences betweenmyelodysplastic syndrome (MDS) genomes and the gen-omes found in those patients’ secondary AML (sAML)tumors. MDS identifies a heterogeneous group of syn-dromes characterized by dysplasia and ineffective hema-topoesis. Since about 1/3 of these patients progress tosAML for reasons that are not well understood at thegenomic level, we characterized these genomes to under-stand novel somatic variants in the sAML cells. In ourstudy, the results were quite different than the de novo torelapse AML study outlined above. Namely, we foundthat the sAML genomes were all oligoclonal (comprisedof several related tumor cell subclones, each with uniquesets of mutations), each containing a pre-existing MDSfounder clone that was out-competed in the sAML tumorcell population in some cases. We hypothesized that theoligoclonal nature of the sAML presentation may con-tribute to the very poor response rates of these patients to
Genome sequencing and cancer Mardis 247
Figure 2
DNMT3A, NPM1, FLT3, PTRPT
12.74%
HSCs
Diagnosis: Multipleleukemic clones
present
Clinical remission:loss of most
leukemic clones
Relapse: Acquisitionof new mutations in a
pre-existing clone
29.04%
5.10%
53.12%
AML1/ UPN933124
cell type:
normal AML
mutations:
founder (cluster 1)
primary specific (cluster 2)
relapse enriched (cluster 3)relapse enriched (cluster 4) random mutations in HSCs
pathogenic mutationsrelapse specific (cluster 5)
ETV6, WNK1-WAC,MY018B
Chemotherapy
Current Opinion in Genetics & Development
Model of the clonal progression process that occurs between the initial (de novo) and relapse presentation in AML patients. At diagnosis, this patienthas an oligoclonal disease characterized by four different subclones, each present at a specific proportion in the tumor cell population and with aspecific mutational profile. Chemotherapy used to induce the patient into remission decreases clonal heterogeneity but a single subclone persists,acquires new mutations, and again proliferates in the bone marrow as a relapse-specific subclone.
www.sciencedirect.com Current Opinion in Genetics & Development 2012, 22:245–250E.Mardis,CurrentOpinioninGenetics&Development2012,22:245–250
![Page 10: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/10.jpg)
HeteroplasmyMixtureofmorethanonetypeoforganellegenomewithincell◦ 100-10,000mtDNA copies/cell
Predisposingfactorformitochondrialdiseases◦ Late(adult)onset
Possiblybeneficialinsomecases◦ Centenarianshaveaboveavg freq forheteroplasmy
Coble etal,PLOSOne,2009;4(3):e4838
![Page 11: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/11.jpg)
Variantsvs.ErrorsMustdistinguishbetweenactualvariation (realchange)anderrors(artifacts)introducedintotheanalysis
Errorscancreepinonvariouslevels:◦ PCRartifacts (amplificationoferrors)◦ Sequencing (errorsinbasecalling)◦ Alignment (misalignment,mis-gappedalignments)◦ Variantcalling (lowdepthofcoverage,fewsamples)◦ Genotyping (poorannotation)
Trytocontrolforthesewhenpossibletoreducefalsepositives w/oincurring(worse)falsenegatives
![Page 12: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/12.jpg)
Example:Initialrawsequencedata
![Page 13: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/13.jpg)
SequencequalityDifferenttechnologieshavedifferenterrors,errorrates◦ 454– homopolymer trackerrors◦ Illumina – substitutionerrors◦ PacBio – indels.Lotsandlotsof‘em.
Representedasaqualityscore(Phred)◦ Q=-10log10(e)
Phred Quality Score Probability of incorrect base call Base call accuracy10 1 in 10 90%20 1 in 100 99%30 1 in 1000 99.90%40 1 in 10000 99.99%50 1 in 100000 ~100.00%
![Page 14: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/14.jpg)
Formats:FASTQ– ‘sequencewithquality’
Three‘variants’– Sanger,Illumina,Solexa (Sangerismostcommon)
Maybe‘raw’data(straightfromseq pipeline)orprocessed(trimmedforvariousreasons)
Canhold100’sofmillionsofrecordspersample
Filescanbeverylarge(100’sofGB)apiece
@HWI-ST1155:109:D0L23ACXX:5:1101:2247:1985 1:N:0:GCCAAT
NTTCCTTTGACAAATATTAAAATTAAGAATCAAATATGGTAGTGTATGCCAAGACCTAGTCTGAGTCAGTAGGAT
+
#1=DDFFFHHHHHJJJJJJIJJJIJJIJIJJJJJJJIJI?FHFHEIJEIIIEGFFHHGIGHIJEIFGIJHGDIII
![Page 15: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/15.jpg)
Formats:FASTQ– ‘sequencewithquality’@HWI-ST1155:109:D0L23ACXX:5:1101:2247:1985 1:N:0:GCCAAT
NTTCCTTTGACAAATATTAAAATTAAGAATCAAATATGGTAGTGTATGCCAAGACCTAGTCTGAGTCAGTAGGAT
+
#1=DDFFFHHHHHJJJJJJIJJJIJJIJIJJJJJJJIJI?FHFHEIJEIIIEGFFHHGIGHIJEIFGIJHGDIII
VerylowPhred score,lessthan10
![Page 16: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/16.jpg)
@HWI-ST1155:109:D0L23ACXX:5:1101:2247:1985 1:N:0:GCCAAT
NTTCCTTTGACAAATATTAAAATTAAGAATCAAATATGGTAGTGTATGCCAAGACCTAGTCTGAGTCAGTAGGAT
+
#1=DDFFFHHHHHJJJJJJIJJJIJJIJIJJJJJJJIJI?FHFHEIJEIIIEGFFHHGIGHIJEIFGIJHGDIII
Formats:FASTQ– ‘sequencewithquality’
LowPhred score,<20
![Page 17: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/17.jpg)
@HWI-ST1155:109:D0L23ACXX:5:1101:2247:1985 1:N:0:GCCAAT
NTTCCTTTGACAAATATTAAAATTAAGAATCAAATATGGTAGTGTATGCCAAGACCTAGTCTGAGTCAGTAGGAT
+
#1=DDFFFHHHHHJJJJJJIJJJIJJIJIJJJJJJJIJI?FHFHEIJEIIIEGFFHHGIGHIJEIFGIJHGDIII
Formats:FASTQ– ‘sequencewithquality’
Highqualityreads,Phred score>30
![Page 18: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/18.jpg)
Howdosequencingerrorsoccur?
![Page 19: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/19.jpg)
Illumina Sequencing
![Page 20: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/20.jpg)
Illumina Sequencing
![Page 21: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/21.jpg)
Illumina Sequencing
![Page 22: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/22.jpg)
Illumina Sequencing
![Page 23: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/23.jpg)
Checksequencedata!
![Page 24: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/24.jpg)
BasicExperimentalDesign
![Page 25: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/25.jpg)
TerminologyLane – Physicalsequencinglane
Library – UnitofDNApreppooledtogether
Sample – Singleindividual
Cohort – Collectionofsamplesanalyzedtogether lane
flowcell
![Page 26: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/26.jpg)
TerminologySample(single)vsCohort(multiple)
Library
Lane
Flowcell
lane
flowcell
Library
Ingeneral,Library=Sample
![Page 27: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/27.jpg)
TerminologyWGSvs Exome Capture◦ Wholegenomesequencing – everything◦ Highcostifpersampleisdeepsequence(>25-30x)◦ Canrunmultisample lowcoveragesamples
◦ Exome capture – targetedsequencing(1-5%ofgenome)◦ Deepercoverageoftranscribedregions◦ Missotherimportantnon-codingregions(promoters,introns,enhancers,smallRNA,etc)
![Page 28: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/28.jpg)
Single*vs.*mul)6sample*analysis*
Deep%singleGsample%
• Higher*sensi)vity*for*variants*in*the*sample*
• More*accurate*genotyping*per*sample*
• Cost:*no*informa)on*about*other*samples*
Shallower%mulJGsample%
Sample 1"
Variants"
Fixed*sequencing*budget*
Found 3 variants total" Found 5 variants total"
Sample 1"
Sample 2"
Sample 3"
Sample 4"
• Sensi)vity*dependent*on*frequency*of*varia)on*
• Worse*genotyping*• More*total*variants*
discovered*
![Page 29: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/29.jpg)
High6pass*sequencing*design*
Exon I" Exon II"Intron I"Intergenic" Intergenic"
Targeted bases" ~3 Gb"
Coverage" Avg. 30x"
# sequenced bases" 100 Gb"
# lanes of HiSeq" ~8 lanes"
Variants found per sample" ~3-5M"
Percent of variation in genome" >99%"
Pr{singleton discovery}" >99%"
Pr{common allele discovery}" >99%"
Data requirements per sample" Variant detection among multiple samples"
~30x reads"
Variant site"
Excellent sensitivity for hetero- and homozygotes"
High depth allows excellent genotype calling"
Datarequirementspersample VariantdetectionamongmultiplesamplesTargetbases 3Gb Variantsfoundpersample ~3-5M Coverage Avg.30x Percentofvariationingenome >99% #sequencedbases 100gb Pr{singletondiscovery} >99% #perlane(HiSeq4000) ~1 Pr{commonallelediscovery} >99%
![Page 30: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/30.jpg)
Low6pass*sequencing*design*
Exon I" Exon II"Intron I"Intergenic" Intergenic"
Targeted bases" ~3 Gb"
Coverage" Avg. 4x"
# sequenced bases" 20 Gb"
# lanes of HiSeq" ~1.25"
Variants found per sample" ~3M"
Percent of variation in genome" ~90%"
Pr{singleton discovery}" <50%"
Pr{common allele discovery}" ~99%"
Data requirements per sample" Variant detection among multiple samples"
~4x reads"
Variant site"
Variants missed by sampling"
Heterozygotes can be mistaken for
homozygotes due to sampling"
Significantly better power to detect homozygous sites"
Datarequirementspersample VariantdetectionamongmultiplesamplesTargetbases 3Gb Variantsfoundpersample ~3MCoverage Avg.4x Percentofvariationingenome ~90% #sequencedbases 20gb Pr{singletondiscovery} < 50% #perlane(HiSeq4000) ~6 Pr{commonallelediscovery} ~99%
![Page 31: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/31.jpg)
Exome Capture
![Page 32: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/32.jpg)
ExomeCapture(TruSeq ExomeCapture)
![Page 33: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/33.jpg)
Exome*capture*sequencing*design*
Exon I" Exon II"Intron I"Intergenic" Intergenic"
Targeted bases" ~32Mb"
Coverage" >80% 20x"
# sequenced bases" 5 Gb"
# lanes of HiSeq" ~0.33"
Variants found per sample" ~20K"
Percent of variation in genome" 0.5%"
Pr{singleton discovery}" ~95%"
Pr{common allele discovery}" ~95%"
Data requirements per sample" Variant detection among multiple samples"
150x reads" Little off-target
coverage"
Variant site"
Datarequirementspersample VariantdetectionamongmultiplesamplesTargetbases 45Mb Variantsfoundpersample ~25,000Coverage >80%20x Percentofvariationingenome 0.005#sequencedbases 5Gb Pr{singletondiscovery} ~95% #perlane(HiSeq4000) 24 Pr{commonallelediscovery} ~95%
![Page 34: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/34.jpg)
GeneralvariantcallingpipelinesCommonpattern:◦ Alignreads◦ Optimizealignment◦ Callvariants◦ Filtercalledvariants◦ Annotate
![Page 35: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/35.jpg)
PipelineexamplesExamples:◦ BroadGenomeAnalysisToolkit(GATK)
◦ samtools mpileup◦ VarScan2
Specialized pipelines◦ Heteroplasmy,tumorsampleanalyses
![Page 36: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/36.jpg)
GATKPipeline
![Page 37: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/37.jpg)
PhaseI:NGSDataProcessing
![Page 38: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/38.jpg)
PhaseINGSDataProcessing
◦ Alignmentofrawreads
◦ Duplicatemarking
◦ Basequalityrecalibration
◦ LocalrealignmentnolongerrequiredifyouusetheHaplotypeCaller
![Page 39: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/39.jpg)
PhaseI:Alignmentofrawreads
Accuracy
◦ Sensitivity – mapsreadsaccurately,allowingforerrorsorvariation
◦ Specificity – mapstothecorrectregion
Heng Li’salignerassessment
![Page 40: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/40.jpg)
PhaseI:Alignmentofrawreads
Accuracyassessedusingsimulateddata
Generally,BWA-MEMorNovoalignarerecommended
Uniquevs.multi-mappedreads◦ Shouldweretainreadsmappingtorepetitiveregions?
◦ Maydependontheapplication
Heng Li’salignerassessment
![Page 41: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/41.jpg)
PhaseI:AlignmentofrawreadsAlmost100ofthese(2016):http://wwwdev.ebi.ac.uk/fg/hts_mappers/
![Page 42: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/42.jpg)
Mapping'short'reads'to'a'reference'is'simple'in'principle'
Reference)genome)
Mapping'and'alignment'algorithms'
Reads'mapped'to'reference'
Enormous)pile)of)short)reads)from)NGS)
Iden;fy'where'the'read'matches'the'reference'sequence'and'record'match'details'as'CIGAR'string'
Region'1' Region'2' Region'3'
![Page 43: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/43.jpg)
But mapping is actually very hard because of mismatches (true mutations or sequencing errors), duplicated regions etc.!
Region'1'
High'MQ' Low'MQ'
Reference)genome)
Region'2A' Region'2B'
For'more'informa;on'see:'
Li'and'Homer'(2010).'A'survey'of'sequence'alignment'algorithms'for'nextBgenera;on'sequencing.''Briefings)in)Bioinforma.cs.)
Enormous)pile)of)short)reads)from)NGS)
Mapping'and'alignment'algorithms'
Mapping'algorithms'account''
for'this'by'choosing'the'most''
likely'placement)
� )mapping)quality)(MQ))
![Page 44: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/44.jpg)
Alignmentoutput:SAM/BAMSAM– SequenceAlignment/Mapformat◦ SAMfileformatstoresalignmentinformation
◦ NormallyconvertedintoBAM(textformatismostlyuselessforanalysis)
Specification:http://samtools.sourceforge.net/SAM1.pdf
ContainsFASTQreads,qualityinformation,metadata,alignmentinformation,etc.
Filesaretypicallyverylarge:Many100’sofGBormore
![Page 45: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/45.jpg)
Alignmentoutput:SAM/BAMBAM– BGZFcompressedSAMformat◦ Maybeunsorted,orsortedbysequencenameorgenomecoordinates
◦ Maybeaccompaniedbyanindexfile(.bai)(onlyifcoord-sorted)◦ Makesthealignmentinformationeasilyaccessibletodownstreamapplications(largegenomefilenotnecessary)
◦ Relativelysimpleformatmakesiteasytoextractspecificfeatures,e.g.genomiclocations
◦ BAMisthecompressed/binaryversionofSAMandisnothumanreadable.Usesaspecializecompressionalgorithmoptimizedforindexingandrecordretrieval(bgzip)
Filesaretypicallyverylarge: 1/5ofSAM,butstillverylarge
![Page 46: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/46.jpg)
Alignment
SAMformat
Alignmentoutput:SAM/BAM
![Page 47: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/47.jpg)
SAMformat
Alignmentoutput:SAM/BAM
![Page 48: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/48.jpg)
SAMformat
![Page 49: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/49.jpg)
SAMformat
![Page 50: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/50.jpg)
BitFlags
Hex 0x80 0x40 0x20 0x10 0x8 0x4 0x2 0x1Bit 128 64 32 16 8 4 2 1
r001 1 1 1 1 = 163
![Page 51: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/51.jpg)
SAMformat
![Page 52: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/52.jpg)
SAMformat
![Page 53: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/53.jpg)
SAMformat
![Page 54: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/54.jpg)
CIGAR
![Page 55: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/55.jpg)
SAMformat
![Page 56: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/56.jpg)
SAMformat
![Page 57: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/57.jpg)
SAMformat
![Page 58: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/58.jpg)
SAMformat
![Page 59: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/59.jpg)
Alignmentoutput:SAM/BAM
Toomanytogoover!!!
![Page 60: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/60.jpg)
Alignmentoutput:SAM/BAMTools◦ samtools◦ Picard
MininginformationfromaproperlyformattedBAMfile:◦ Readsinaregion(goodforRNA-Seq,ChIP-Seq)◦ Qualityofalignments◦ Coverage◦ …andofcourse,differences(variants)
![Page 61: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/61.jpg)
PhaseI:DuplicateMarking
The'reason'why'duplicates'are'bad'
Reference)genome)
Reads'mapped'to'reference'
FP)variant)call)(bad))
='sequencing'error'propagated'in'duplicates'
ATer)marking)duplicates,)the)GATK)will)only)see):)
…)and)thus)be)more)likely)to)make)the)right)call)
![Page 62: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/62.jpg)
PhaseI:DuplicateMarking
Duplicates'have'the'same'star;ng'posi;on''and'the'same'CIGAR'string'
Reference)genome)
Reads'mapped'to'reference'
Easy)to)bag)&)tag)
Hey,)Picard)has)an)app)for)that!)
![Page 63: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/63.jpg)
PhaseI:Sorting,ReadGroupsA'quick'diversion'about'sor;ng'and'read'groups'
The)informaKon)for)this:) …)is)actually)stored)as)a)text)file)with)one)line)per)read)which)from)far)away)looks)like)this:)
…)but)the)GATK)wants)reads)to)be)sorted)by)starKng)posiKon)like)this:)
And'while'we’re'at'it,'let’s'add'read)group)informa;on'if'it'isn’t'already'there,'so'the)GATK)will)know)what)read)belongs)to)what)sample'(that’s'kind'of'important).'
…)and)Picard)has)an)app)for)that!)
Hey,)Picard)has)an)app)for)that)too!)
The)reads)are)in)no)parKcular)order…)
So)we)need)to)explicitly)sort)the)SAM)file…)
![Page 64: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/64.jpg)
TerminologyReadgroups– informationaboutthesamplesandhowtheywererun◦ ID– Simpleuniqueidentifier◦ Library◦ Samplename◦ Platform– sequencingplatform◦ Platformunit– barcodeoridentifier◦ Sequencingcenter(optional)◦ Description(optional)◦ Rundate(optional)
![Page 65: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/65.jpg)
PhaseI:Sorting,ReadGroupsTypical'workflow'using'Picard'tools'to'mark'duplicates'et)al.)
java)^jar)SortSam.jar)\))I=input.sam)\))O=output.bam)\))SO=coordinate)
java)^jar)MarkDuplicates.jar)\))I=input.sam)\))O=output.bam)
java)^jar)AddOrReplaceReadGroups.jar)\))I=input.bam)\))O=output.bam)\))RGID=id)RGLB=solexa^123)\))RGPL=illumina)RGPU=AXL2342)\))RGSM=NA12878)RGCN=bi)\))RGDT=12/12/2011)
Original)SAM) Ordered)BAM)
Dedupped)BAM)Final)BAM)
Wham,)BAM,)thank)you)Picard!)
sort'
“dedup”'
add'RG'
![Page 66: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/66.jpg)
PhaseI:LocalRealignmentRealignaroundindels
Commonmisalignmentproblem
Cancauseproblemswithbasequalityrecalibration,variantcalls
![Page 67: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/67.jpg)
DePristo, M., Banks, E., Poplin, R. et. al, A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Gen.!
This*is*what*a*realigned*BAM*looks*like*
Before* Afer*
![Page 68: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/68.jpg)
PhaseI:BaseQualityScoreRecalibrationQualityscoresfromsequencersarebiasedandsomewhatinaccurate
Qualityscoresarecriticalforalldownstreamanalysis
Biasesareamajorcontributortobadvariantcalls
Caveat:◦ Inpractice,generallyrequireshavingaknownsetofvariants(dbSNP)
![Page 69: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/69.jpg)
![Page 70: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/70.jpg)
PhaseII:VariantDiscovery/Genotyping
![Page 71: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/71.jpg)
PhaseII:VariantCallingThisiswhereweactuallycallthevariants
Priorstepsleadinguptothishelpremovepotentialcausesofvariantcallingerrors
I’llbecoveringuseoftheUnifiedGenotyper variantcallingtoolinGATK◦ …butyoushouldkeepaneyeonthenewesttool,HaplotypeCaller
![Page 72: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/72.jpg)
PhaseII:VariantCallingIngeneral,usesaprobabilisticmethod,e.g.Bayesianmodel◦ DeterminethepossibleSNPandindel alleles◦ Only“goodbases”areincluded:
◦ Thosesatisfyingminimumbasequality,mappingreadquality,pairmappingquality,etc.
◦ Compute,foreachsample,foreachgenotype,likelihoodsofdatagivengenotypes
◦ Computetheallelefrequencydistributiontodeterminemostlikelyallelecount;emitavariantcallifdetermined
◦ Ifwearegoingtoemitavariant,assignagenotypetoeachsample
![Page 73: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/73.jpg)
HetREF:50%ALT:50%
HomREF:0%ALT:100%
??REF:77%ALT:23%
![Page 74: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/74.jpg)
Variantcallingoutput:VCFVCF(VariantCallFormat)
LikeSAM/BAM,alsohasaversionedspecification◦ Fromthe1000GenomesProject◦ http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
![Page 75: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/75.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 76: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/76.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:
![Page 77: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/77.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 78: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/78.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 79: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/79.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 80: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/80.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 81: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/81.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 82: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/82.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 83: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/83.jpg)
Formats:VCF##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
Samples
![Page 84: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/84.jpg)
VCFCol Field Description
1 CHROM Chromosome name
2 POS1-based position. For an indel, this is the position preceding the indel.
3 ID Variant identifier. Usually the dbSNP rsID.
4 REFReference sequence at POS involved in the variant. For a SNP, it is a single base.
5 ALT Comma delimited list of alternative seuqence(s).
6 QUALPhred-scaled probability of all samples being homozygous reference.
7 FILTER Semicolon delimited list of filters that the variant fails to pass.
8 INFO Semicolon delimited list of variant information.
9 FORMATColon delimited list of the format of individual genotypes in the following fields.
10+ Sample(s) Individual genotype information defined by FORMAT.
![Page 85: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/85.jpg)
PhaseII:FilteringTwobasicmethods:◦ Hardfiltering◦ Variantqualityscorerecalibration(VQSR)
![Page 86: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/86.jpg)
PhaseII:HardFilteringReducingfalsepositivesbye.g.requiring◦ SufficientDepth◦ Varianttobein>30%reads◦ Highquality◦ Strandbalance◦ Etc etc etc
Veryhighdimensionalsearchspace◦ …so,verysubjective!
StrandBias
![Page 87: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/87.jpg)
PhaseII:HardFiltering“TheeffectofstrandbiasinIllumina short-readsequencingdata”◦ Guo etal.,BMCGenomics 2012,13:666◦ Lastsentence:“IndiscriminantuseofstrandbiasasafilterwillresultinalargelossoftruepositiveSNPs”
StrandBias
![Page 88: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/88.jpg)
PhaseII: VariantQualityScoreRecalibration(VQSR)ConsideredGATK‘bestpractice’
Trainontrustedvariants
Requirethenewvariantstoliveinthesamehyperspace
Potentialproblems:◦ Over-fitting◦ BiasingtofeaturesofknownSNPs
![Page 89: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/89.jpg)
PhaseIII:IntegrativeAnalysis
![Page 90: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/90.jpg)
PhaseIII:FunctionalAnnotationArethesemutationsinimportantregions?◦ Genes?UTR?◦ Aretheychangingthecodingsequence?
Wouldthesechangeshaveanaffect?
Tools:◦ SnpEff/SnpSift◦ Annovar
![Page 91: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/91.jpg)
Theendofthe(pipe)line
![Page 92: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/92.jpg)
Follow-upQualityControlTransition/Transversion ratio(Ti/Tv)
◦ bcftools canhelphere
Concordancewithknownvariants:dbSNP,HapMap,1000genomes
Others?
Condition ExpectedTi/Tvrandom 0.5whole genome 2.0-2.1exome 3.0-3.5
![Page 93: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/93.jpg)
CallingvariantsoncohortsofsamplesWhenrunningHaplotypeCaller,canuseaspecificoutputtypecalledGVCF(GenotypeVCF)◦ Containsgenotypelikelihoodandannotationforeachsiteingenome
Performjointgenotypingcallsoncohort
Canrerunasneededifmoresamplesaddedtocohort
UsedforExAC cohort(92Kexomes)
![Page 94: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/94.jpg)
Callingvariantsoncohortsofsamples
![Page 95: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/95.jpg)
StructuralVariationToolslikeGATK,samtools can’tcurrentlydetectlargerstructuralchangeseasily
Alkan etal,NatureGenetics12:363,2011
![Page 96: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/96.jpg)
StructuralVariationWiththelatestreleasesofGATK(v3.7orhigher),thisischanging:
https://software.broadinstitute.org/gatk/best-practices/
![Page 97: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/97.jpg)
StructuralVariationDetectionusingNGSdatagenerallyrequiresmulti-layeranalyses,mayfocusonspecificSVtypes
Commontools:◦ CNVnator – grossdetectionofCNVs◦ BreakDancer,Cortex– breakpointdetection◦ Pindel – largedeletions◦ Hydra-SV– readpairdiscordance
Recenttools(lumpy-sv,GASVPRO)integrateapproaches
![Page 98: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/98.jpg)
StructuralVariationStrategiesReadpairdiscordance◦ Insertsizeisoff,orientationofreadsiswrong
Readdepth◦ Regiondeviatesfromexpectedreaddepth
Splitreads◦ Singlereadissplit,partsalignintwodistinctuniquelocations
‘Assembly’◦ Reference-basedlocalassembliesindicateinconsistencies
![Page 99: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/99.jpg)
StructuralVariationAnalyses
Ref
Ref
![Page 100: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/100.jpg)
StructuralVariationAnalyses
![Page 101: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/101.jpg)
StructuralVariationStillanactiveareaofresearch
Problems:◦ Lotsoffalsepositives◦ Hardtocomparemethodologies
![Page 102: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/102.jpg)
AssociationStudiesGenome-wideassociationstudies(GWAS)
Tryingtodeterminewhetherspecificvariant(s)inmanyindividualscanbeassociatedwithatrait
Ex: comparisonofgroupsofpeoplewithadisease(cases)andwithout(controls)
![Page 103: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/103.jpg)
Findingthecausalvariantinidealsituations*
Spotthevariantthatiscommonamongstallaffectedbutabsentinallunaffected
Thisvariantisinagenewithknownfunctionandcausestheproteintobedisrupted
*e.g.somerareautosomal disease
![Page 104: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/104.jpg)
InrealityYoucan’tspotthedifference◦ Youdealwith~3.5millionSNPs◦ Youneedtoemploymethodsthatsystematicallyidentifyvariantsthatstandout:GWAS
GWAStaughtusthatitisunlikelytofindacausalcommonvariantforcomplexdiseases◦ RareVariant?◦ Abunchofrareandcommonvariants?◦ Anevenmorecomplexmodel?
1000GenomesProjectConsortium.Amapofhumangenomevariationfrompopulation-scalesequencing.Nature.2010Oct28;467(7319):1061-73.PubMed PMID:20981092
![Page 105: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/105.jpg)
GWASCriticismsofinitialGWASstudies◦ Poorcontrols◦ Severelyunderpowered◦ Tonsoffalse-positives◦ Notworththecost
Ledtosomeretractions
Wikipediaarticleoutlinesthelimitationsverywell◦ http://en.wikipedia.org/wiki/Genome-wide_association_study
![Page 106: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/106.jpg)
GWASMostoftheseusegenotypingarrays
UseofNGSmayhelpaddresssomeofthepastlimitations
![Page 107: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/107.jpg)
GATKv4Outnowinpre-release
~5xfasterwithHaplotypeCaller
Completelyopen-source,buthasspecificsoftwarerequirements
https://software.broadinstitute.org/gatk/download/alpha
![Page 108: Variant Calling - Illinoisveda.cs.uiuc.edu/CompGen2017/lecture/03_Variant_Calling_v2.pdf · variant calling chris fields mayo-illinois computationalgenomics workshop, june 19, 2017](https://reader035.fdocuments.net/reader035/viewer/2022063008/5fbc6306db517e4c6a77c51b/html5/thumbnails/108.jpg)