Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences...

30
Polymorphism Haixu Tang School of Informatics

Transcript of Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences...

Page 1: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Polymorphism

Haixu Tang

School of Informatics

Page 2: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Genome variations

underlie phenotypic differences

cause inherited diseases

Page 3: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Restriction fragment length polymorphism (RFLP)

Page 4: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

RFLP

Haplotype

Page 5: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Microsattelite (short tandem repeats) polymorphysim

the repeat region is variable between samples while the flanking regions where PCR primers bind are constant

7 repeats

8 repeats

AATG

Page 6: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Which Suspect,

A or B, cannot

be excluded from

potential perpetrators

of this assault?

Page 7: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Single nucleotide polymorphism

• The highest possible dense polymorphism

• A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more than 1 percent) of a large population.

Page 8: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Some Facts

• In human beings, 99.9 percent bases are same.• Remaining 0.1 percent makes a person unique.

– Different attributes / characteristics / traits • how a person looks, • diseases he or she develops.

• These variations can be:– Harmless (change in phenotype)– Harmful (diabetes, cancer, heart disease, Huntington's disease,

and hemophilia )– Latent (variations found in coding and regulatory regions, are not

harmful on their own, and the change in each gene only becomes apparent under certain conditions e.g. susceptibility to lung cancer)

Page 9: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

SNP facts

• SNPs are found in – coding and (mostly) noncoding regions.

• Occur with a very high frequency– about 1 in 1000 bases to 1 in 100 to 300 bases.

• The abundance of SNPs and the ease with which they can be measured make these genetic variations significant.

• SNPs close to particular gene can acts as a marker for that gene.

Page 10: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

SNP maps

• Sequence genomes of a large number of people

• Compare the base sequences to discover

SNPs.

• Generate a single map of the human genome containing all possible SNPs => SNP maps

Page 11: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

How do we find sequence variations?

• look at multiple sequences from the same genome region

• use base quality values to decide if mismatches are true polymorphisms or sequencing errors

Page 12: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Automated polymorphism discovery

Siableall

TGCAS TGCASiiior

iior

i

iior

i

NiorNior

NN

ior

i Ni

N

N

N SSPSP

RSP

SP

RSP

SSPSPRSP

SPRSP

SNPPvar

],,,[ ],,,[Pr

Pr

1

Pr

1

1PrPr1Pr

11

1

1

1

1 ),...,()(

)|(...

)(

)|(...

),...,()()|(

...)()|(

)(

Marth et al. Nature Genetics 1999

Page 13: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Large SNP mining projects

Sachidanandam et al. Nature 2001

~ 8 million

EST

WGS

BAC

genome reference

Page 14: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

How to use markers to find disease?

• genotyping: using millions of markers simultaneously for an association study

genome-wide, dense SNP marker map

• depends on the patterns of allelic association in the human genome

• question: how to select from all available markers a subset that captures most mapping information (marker selection)

Page 15: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Allelic association

• allelic association is the non-random assortment between alleles i.e. it measures how well knowledge of the allele state at one site permits prediction at another

marker site functional site

• by necessity, the strength of allelic association is measured between markers

• significant allelic association between a marker and a functional site permits localization (mapping) even without having the functional site in our collection

Page 16: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Linkage disequilibrium

• LD measures the deviation from random assortment of the alleles at a pair of polymorphic sites

D=f( ) – f( ) x f( )

• other measures of LD are derived from D, by e.g. normalizing according to allele frequencies (r2)

Page 17: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

strong association: most chromosomes carry one of a few common haplotypes – reduced haplotype diversity

Haplotype diversity

• the most useful multi-marker measures of associations are related to haplotype diversity

2n possible haplotypesn

markers

random assortment of alleles at different sites

Page 18: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Haplotype blocks

Daly et al. Nature Genetics 2001

• experimental evidence for reduced haplotype diversity (mainly in European samples)

Page 19: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

The promise for medical genetics

CACTACCGACACGACTATTTGGCGTAT

• within blocks a small number of SNPs are sufficient to distinguish the few common haplotypes significant marker reduction is possible

• if the block structure is a general feature of human variation structure, whole-genome association studies will be possible at a reduced genotyping cost

• this motivated the HapMap project

Gibbs et al. Nature 2003

Page 20: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

The HapMap initiative

• goal: to map out human allele and association structure of at the kilobase scale

• deliverables: a set of physical and informational reagents

Page 21: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Haplotyping

• the problem: the substrate for genotyping is diploid, genomic DNA; phasing of alleles at multiple loci is in general not possible with certainty

• experimental methods of haplotype determination (single-chromosome isolation followed by whole-genome PCR amplification, radiation hybrids, somatic cell hybrids) are expensive and laborious

A

T

C

T

G

C

C

A

Page 22: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

A example of hyplotyping

• Mother GG AT CA TT

• Father CC AA AC CT

• Children GC AA CC CT

• Children GC AT AA TT

• Children GC AA AC CT

Page 23: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Haplotypes

• a b

• Mother I G A C T G T A T

• II G T C T G A A T

• Father I C A A C C A C T

• II C A A T C A C C

Page 24: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

A example of hyplotyping

• Mother GG AT CA TT

• Father CC AA AC CT

• Children GC AA CC CT (M-Ia & F-IIb)

• Children GC AT AA TT (M-Ib & F-IIa)

• Children GC AA AC CT (M-Ia & F-Ia

or M-IIb & F-IIb) ?

Page 25: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

HapMap Project

High-density SNP genotyping across the genome provides information about– SNP validation, frequency, assay conditions– correlation structure of alleles in the genome

A freely-available public resource to increase the power and efficiency

of genetic association studies to medical traits

All data is freely available on the web for applicationin study design and analyses as researchers see fit

Page 26: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

HapMap Samples

• 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI)

• 90 individuals (30 trios) of European descent from Utah (CEU)

• 45 Han Chinese individuals from Beijing (CHB)

• 45 Japanese individuals from Tokyo (JPT)

Page 27: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

HapMap progress

PHASE I – completed, described in Nature paper

* 1,000,000 SNPs successfully typed in all 270 HapMap samples

PHASE II – data generation complete, data released

* >3,500,000 SNPs typed in total !!!

Page 28: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

ENCODE-HAPMAP variation project

• Ten “typical” 500kb regions

• 48 samples sequenced

• All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples

• Current data set – 1 SNP every 279 bp

A much more complete variation resource by whichthe genome-wide map can evaluated

Page 29: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Tagging from HapMap

• Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies

Page 30: Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.

Pairwise tagging

Tags:

SNP 1SNP 3SNP 6

3 in total

Test for association:

SNP 1SNP 3SNP 6

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

high r2 high r2 high r2

AATT

GC

CG

GC

CG

TCCC

ACCC

GC

CG

TCCC

GGAA

GGAA

After Carlson et al. (2004) AJHG 74:106