Resources at HapMap.Org HapMap3 Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
-
Upload
bryce-hodge -
Category
Documents
-
view
220 -
download
0
Transcript of Resources at HapMap.Org HapMap3 Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
Resources at HapMap.Org
HapMap3 Tutorial
Marcela K. Tello-RuizCold Spring Harbor Laboratory
Basic Concepts
A B
a b
A B
a b
High LD -> No Recombination(r2 = 1) SNP1 “tags” SNP2
A B
A B
A B
a b
a b
a b
Low LD -> RecombinationMany possibilities
A b
A ba Ba b
A BA B
a B
A b
etc…
A B
A B
X
OR
Parent 1 Parent 2
Basic ConceptsSNP1 SNP2
alleles: A/a B/bC1C2
POP allele freqs: A (80%) B (60%)a (20%) b (40%)
genotypes:Person 1 Person 2 Person 3
AA AA Aa BB Bb Bb
phased haplotypes (C1/C2):A B A B A B A BA b a b
ORA ba B
HapMap Glossary• LD (linkage disequilibrium): For a pair of SNP
alleles, it’s a measure of deviation from random association (i.e., no recombination). Measured by D’, r2, LOD
• Phased haplotypes: Estimated distribution of SNP alleles. Alleles transmitted from Mom are in same chromosome haplotype, while Dad’s form the paternal haplotype.
• Tag SNPs: Minimum SNP set to identify a haplotype. r2= 1 indicates two SNPs are redundant, so each one perfectly “tags” the other.
• Questions? [email protected]
HapMap Project
Phase 1 Phase 2 Phase 3
Samples & POP panels
269 samples(4 panels)
270 samples(4 panels)
1,115 samples (11 panels)
Genotyping centers
HapMap International Consortium
Perlegen Broad & Sanger
Unique QC+ SNPs
1.1 M 3.8 M(phase I+II)
1.6 M (Affy 6.0 & Illumina 1M)
Reference Nature (2005) 437:p1299
Nature (2007) 449:p851
Draft Rel. 1 (May 2008)
Release Notes• Phase 1+2: Latest Release #24, October 2008
(NCBI build 36):
3.9 M unique QC+ SNPs -- > 1 SNP/700 bp
http://ftp.hapmap.org/00README.releasenotes_rel24
– Added back chrX SNPs dropped in previous releases– Corrected allele flips from rel#23a
• Phase 3: Draft release #1 (NCBI build 36)
http://ftp.hapmap.org/genotypes/2008-07_phaseIII/00README.txt
– HapMap3 sites @ Broad Institute, Sanger Center and Baylor College
Phase 3 Sampleslabel population sample # samples QC+ Draft 1ASW* African ancestry in Southwest USA 90 71
CEU*Utah residents with Northern and Western
European ancestry from the CEPH collection180 162
CHB Han Chinese in Beijing, China 90 82CHD Chinese in Metropolitan Denver, Colorado 100 70GIH Gujarati Indians in Houston, Texas 100 83JPT Japanese in Tokyo, Japan 91 82LWK Luhya in Webuye, Kenya 100 83MEX* Mexican ancestry in Los Angeles, California 90 71MKK* Maasai in Kinyawa, Kenya 180 171TSI Toscans in Italy 100 77
YRI* Yoruba in Ibadan, Nigeria 180 1631,301 1,115
* Population is made of family trios
Phase 3• 11 panels & 1,115 samples
– 558/557 males/females– 924/191 founders/non-founders
• Platforms:– Illumina Human 1M (Sanger)– Affymetrix SNP 6.0 (Broad)
• EXCLUDED from QC+ data set: – Samples with low completeness, and SNPs with low call rate in each
pop (< 80%) and not in HWE (p < 0.001)– Overall false positive rate: ~3.2%
• Data merged with PLINK (concordance over 249,889 overlapping SNPs = 0.9931)
• Alleles on the (+/fwd) strand of NCBI b36
Phase 3: Draft Release 1
samples QC+ SNPs poly QC+ SNPs
71 ASW 1,632,186 1,536,247
162 CEU 1,634,020 1,403,896
82 CHB 1,637,672 1,311,113
70 CHD 1,619,203 1,270,600
83 GIH 1,631,060 1,391,578
82 JPT 1,637,610 1,272,736
83 LWK 1,631,688 1,507,520
71 MEX 1,614,892 1,430,334
171 MKK 1,621,427 1,525,239
77 TSI 1,629,957 1,393,925
163 YRI 1,634,666 1,484,416
Phase 3 Data
• HapMap format:http://ftp.hapmap.org/genotypes/2008-07_phaseIII/hapmap_format* Excluded 1,527 SNPs with strandedness issues & 411 indels
• PLINK format:http://ftp.hapmap.org/genotypes/2008-07_phaseIII/plink_format
• HapMap3 sites:Broad - http://www.broad.mit.edu/~debakker/p3.htmlSanger - http://www.sanger.ac.uk/humgen/hapmap3/Baylor - http://www.hgsc.bcm.tmc.edu/projects/human/
Goals of This Tutorial
• Find HapMap3 SNPs near a gene or region of interest (ROI)– Visualize allele frequencies in HapMap3 populations– Download SNP genotypes in ROI for use in Haploview 4.1– Identify GWA hits in the vicinity of ROI & visualize in the context of
all chromosomes (karyogram)– Add custom data onto the GWAs karyogram– Add custom tracks of association data onto ROI– Create publication-quality images
• Download the entire HapMap3 data set in bulk– Distinguish genotype data in PLINK and HapMap formats
• Visualize LD patterns, find tag SNPs, impute genotypes using release #24 (phase 1+2)
• Generate customized extracts of the entire dataset using HapMart
This tutorial will show you how to:
1: Surf to the HapMap Browser
1b. Select “HapMap phase
3”
1a. Go to www.hapmap.o
rg
2: Search for TCF7L2
2. Type search term – “TCF7L2”
Search for a gene name, a
chromosome band, or a phrase like
“insulin receptor”
3: Examine Region
Region view puts your ROI in
genomic context
Chromosome-wide summary data is
shown in overview
Default tracks show HapMap genotyped SNPs, refGenes with exon/intron splicing
patterns, etc.
3: This exonic region has many typed SNPs.
Click on ruler to re-center image.
3: Examine Region (cont)
As you zoom in further, the
display changes to include more
detail
Use the Scroll/Zoom
buttons and menu to change position &
magnification
3: Mouse over a SNP to see allele frequency
table
Click to go to SNP details page
4: Generate Text Reports
4: Select the desired “Download” option and
press “Go” or “Configure”
Available phase 3 downloads:
- Individual genotypes - Population allele & genotype frequencies
4: Generate Reports (cont)
The Genotype download format can be saved to disk or loaded directly into
Haploview v4.1
5: Find GWA hits5a: Scroll down to turn on GWA studies tracks in overview & region
panels
5b: Find GWA hits in nearby region. Click on a GWA hit to re-center
5: Find GWA hits (cont)
5c: Mouse over & click on GWA hit for more
info
6: Examine GWA hits in entire genome
6: From www.hapmap.org, select “Karyogram”
6: Custom GWA hits in karyogram
Detailed help on the format is
under the “Help” link
6: Follow these instructions to upload your own GWA data
7: Create your own tracks
7: Upload example file: TCF7L2_annotations.txt
Example:
• Interested in T2DM genetics
• Create file with custom annotations from http://www.broad.mit.edu/diabetes and superimpose on the HapMap
Detailed help on the format is
under the “Help” link
7: Create your own tracks (cont)
Save as a text file!
Some SNPs were typed (known
platform) and others were imputed. Format data for both typed &
imputed SNPs.
Scores allow you to display data in quantitative form, such as XY plots
7: Create your own tracks (cont)
Remember to point your browser to the
location of your annotations (TCF7L2 gene in this case).
Make edits on your own
browser window by clicking on “Edit File…”
7: Create your own tracks (cont)
7: Create your own tracks (cont)
8: Create Image for Publication
8a. Click on “High-res Image”
Click on the +/- sign to
hide/show a section
Mouse over a track until a cross
appears.
Click on track name to drag track up or
down.
Can view file in Firefox, but use other programs
(Adobe Illustrator or Inkscape) to convert to
other formats and/or edit
8b. Click on “View SVG Image in new browser window”
8c. Save generated file with “.svg”
extensions
8: Image for Publication (cont)
Inkscape is free and lets you edit and convert to other formats
(many journals prefer EPS)
8: Image for Publication (cont)
9. Bulk downloads
18. From www.hapmap.org, click
on “Bulk Data Download”
Or directly click on “Data”
9. Bulk downloadsDownload the entire HapMap3 data set to your own computer
Analytic results (LD & phased haplotype data available for
HapMap3)
HapMap Samples
Protocols & assay design
Your own copy of the HapMap
Browser
9a. Select “Genotypes”
Also available at http://ftp.hapmap.org
HapMap3 genotypes
& frequencie
s
9. Bulk downloads (cont)
9b. Click on hapmap_format/forward to download genotypes
Also at http://ftp.hapmap.org/genotypes/latest_phaseIII_ncbi_b36/
10: Surf to the HapMap phase 1+2 genome browser
10. Go to www.hapmap.org & select
“HapMap Genome Browser B36”
11: Search for TCF7L2
11. Type search term – “TCF7L2”
12: Examine Region
12. Re-center & zoom in
12: Turn on LD & Haplotype Tracks
12b: Press “Update Image”
12a: Scroll down to the “Tracks” section. Turn
on the LD Plot and Haplotype Display
tracks.
These sections allow you to adjust the
display and to superimpose your own data on the
HapMap
13: View variation patternsTriangle plot shows LD
values using r2 or D’/LOD scores in one
or more HapMap populations
Phased haplotype track shows all 120 chromosomes with
alleles colored yellow and blue
14: Adjust Track Settings (on the spot)
14b. Adjust population and
display settings & press “Configure”
14a. Click on question mark
precedingtrack name
14: Adjust Track Settings (cont)
Select the analysis track to adjust and press “Configure”
15: Turn on Tag SNP Track
15: Activate the “tag SNP Picker” and press
“Update Image”
16: Adjust tag SNP picker
Tag SNPs are selected on the fly as you
navigate around the genome
16a: Click on question mark behind “tag SNP
Picker”
Alternatively, you may select
“Annotate tag SNP Picker” and press
“Configure…”
16: Adjust tag SNP picker (cont)
Select population
Select tagging algorithm and parameters
[optional] upload list of SNPs to be
included, excluded, or design scores16b: Press “Configure”
to save changes
17: Impute genotypes using HapMap Data
• Interested in the VAV1 gene
• Commercially available platforms with few overlapping SNPs in this region
• HapMap genotyped lots of SNPs in region
Use genotypes for HapMap SNPs to impute genotypes & compare non-overlapping SNP sets!
17: Impute genotypes using MACH1
17a. Go to chr19:6,765,000..6,900,000
17b. Select “Download Impute Data”, click “Configure”
17: Configure MACH1
17c. Upload input files: example.dat & example.ped.
Enter e-mail address. Click “Go”
17: Impute genotypes: Input files
• example.dat (20 user-provided SNPs; all should be part of the HapMap):
M rs4807101M rs164022M rs625828M rs461970M rs331684…
• example.ped (genotypes for 336 unrelated inds):
PED00001 IND00001 0 0 2 C/C C/C T/T C/T C/C G/G G/G …PED00002 IND00002 0 0 1 C/T C/C T/T T/T C/C A/A A/G …PED00003 IND00003 0 0 2 T/T G/G A/A C/T C/C A/G A/G ……
17. Visualize imputed SNPs
Your imputation results appear as an external
track that can be edited. Hint: Click on “Help” link below for display options
17e. Click “Edit File”
17d. Return to browser
17. Edit external annotations file
17f. Edit annotations file & “Submit Changes”
17. Edit external annotations file
17: Impute genotypes: Results• Info (143 provided & imputed HapMap SNPs)
SNP Al1 Al2 Freq1 MAF Quality Rsqrs10419572 T A 0.9041 0.0959 0.8179 0.1069rs415218 T A 0.9709 0.0291 0.9427 0.0313rs4807100 A G 0.4713 0.4713 0.9790 0.9625rs4807101 T C 0.4714 0.4714 0.9803 0.9649rs1651876 T C 0.9631 0.0369 0.9277 0.0216…
• Geno (143 SNPs x 336 inds)PED00001->IND00001 ML_GENO T/T T/T G/G C/C T/T T/T A/T G/G A/A T/T T/C …PED00002->IND00002 ML_GENO T/T T/T A/G T/C T/T T/T A/T G/G A/A T/T T/C …PED00003->IND00003 ML_GENO T/T T/T A/A T/T T/T T/T A/T G/G A/A T/T T/T ……
• Dose (allele dosage)PED00001->IND00001 ML_DOSE 1.719 1.911 0.004 0.003 1.913 1.980 1.246 1.884 1.949 1.948 1.302 …PED00002->IND00002 ML_DOSE 1.861 1.957 1.000 1.000 1.952 1.892 1.086 1.909 1.949 1.948 1.096 …PED00003->IND00003 ML_DOSE 1.994 1.999 1.993 1.995 1.955 1.656 1.297 1.863 1.987 1.988 1.374……
Probability of match imputed:experimenta
l genotype (1.0 for provided markers)
17g. Check your e-mail for text results
18. Use HapMart to Generate Extracts of the HapMap Dataset
Find all HapMap characterized SNPs that:
1. Have a MAF > 0.20 in the Yoruban population panel (YRI)
2. Cause a nonsynonymous amino acid change
3. Were typed by Perlegen
Further Information
• HapMap Publications & Guidelineshttp://hapmap.cshl.org/publications.html.en
• Past tutorials & user’s guide to HapMap.orghttp://www.hapmap.org/tutorials.html.en
HapMap DCC Present Members (CSHL)Lincoln SteinMarcela K. Tello-RuizZhenyuan LuWei Zhao
HapMap DCC Former MembersLalitha Krishnan Albert Vernon SmithGudmundur ThorissonFiona Cunningham