Population Approaches to Detecting and Genotyping Copy Number Variation

25
Population Approaches to Detecting and Genotyping Copy Number Variation Lachlan Coin July 2010

description

Population Approaches to Detecting and Genotyping Copy Number Variation. Lachlan Coin July 2010. Outline. Population-haplotype approach to CNV detecting and genotyping Application to SNP and CGH data Application to NGS sequence data. cnvHap approach to CNV discovery and genotyping. - PowerPoint PPT Presentation

Transcript of Population Approaches to Detecting and Genotyping Copy Number Variation

Page 1: Population Approaches to Detecting and Genotyping Copy Number Variation

Population Approaches to Detecting and Genotyping Copy

Number Variation

Lachlan Coin

July 2010

Page 2: Population Approaches to Detecting and Genotyping Copy Number Variation

Outline

• Population-haplotype approach to CNV detecting and genotyping

• Application to SNP and CGH data

• Application to NGS sequence data

Page 3: Population Approaches to Detecting and Genotyping Copy Number Variation

cnvHap approach to CNV discovery and genotyping

Coin et al, 2010, Nature Methods 7, 541 - 546 (2010) 

Page 4: Population Approaches to Detecting and Genotyping Copy Number Variation

Example of trained model

Page 5: Population Approaches to Detecting and Genotyping Copy Number Variation

cnvHap models haploid CN transitions

• Specify an per-base global transition rate matrix

copy number to

copy

num

ber

from 0

1234

0 1 2 3 4

q00 q10 ….

• Rate matrix multiplied by position specific scalar rate• Values trained using EM, following the approach of

Klosterman et al, used in Xrate for finding substitution rates

Page 6: Population Approaches to Detecting and Genotyping Copy Number Variation

cnvHap joint model of CNV + SNP haplotypes

Page 7: Population Approaches to Detecting and Genotyping Copy Number Variation

Cluster positions modelled using a linear model

1))((*0.5))((*)(=)(

))((1*)(=)(

)(=)(

)/2))((log(=)(

)/2)((log=)(

1=)(

*=

)(

)(

)(

)(

5

4

3

22

1

0

2

2

ggggf

gggf

ggf

ggf

ggf

gf

g

g

g

g

bm

bm

rm

rm

bfracbfracbfrac

bfracbfrac

bfrac

CN

CN

β

Model fitted using Ridge regression carried at each iteration of E-M algorithm

Page 8: Population Approaches to Detecting and Genotyping Copy Number Variation

Using Illumina SNP arrays

Page 9: Population Approaches to Detecting and Genotyping Copy Number Variation

Illumina Agilent Illumina Agilent Illumina Agilent

Combined Illumina and Agilent arrays

Page 10: Population Approaches to Detecting and Genotyping Copy Number Variation

Some CNVs exhibit shared structure

Page 11: Population Approaches to Detecting and Genotyping Copy Number Variation

Improved CNV genotyping accuracy

Cumulative Frequency of Squared Pearson Correlation

Page 12: Population Approaches to Detecting and Genotyping Copy Number Variation

A deletion at 16p11.2 in a patient with ‘extreme obesity’

• estimated by aCGH to be 546kb-700kb• flanked by segmental duplication (>99% sequence identity)• probably arises by NAHR, implying deletion is 739kb

• BMI = 29.2 kg.m-2 at age 7½• learning difficulties, delayed speech

28.9 Mb 29.2 Mb 29.5 Mb 29.8 Mb 30.1 Mb 30.4 Mb 30.7 Mb

p13.2

p13.1

2

p12.3

p12.1

q12.2

q21

q22.2

q23.1

q23.3

q24.2

p11.2

log2

ratio

+1

0

- 1

- 2

- 3

MLPA probes

Segmental duplication

chromosome 16

RG Walters et al. Nature 463, 671-675 (2010) doi:10.1038/nature08727

Page 13: Population Approaches to Detecting and Genotyping Copy Number Variation

16p11.2 deletions in obesity and population cohorts

-3/931British extreme early-onset obesity (SCOOP)

0/5304/643French child obesity case:control

Lean/Normal Weight

ObeseCohort

0/6694/705French adult obesity case:control

1/62353/1592Population cohorts(NFBC1966, CoLaus, EGPUT)

0/1402/159Swedish discordant siblings

-2/141French bariatric surgery patients

Obesity: P = 5.8x10-7 OR = 29.8 [3.9–225]Morbid obesity: P = 6.4x10-8 OR = 43.0 [5.6–329]

Page 14: Population Approaches to Detecting and Genotyping Copy Number Variation

Coverage affected by GC content

Page 15: Population Approaches to Detecting and Genotyping Copy Number Variation

Regression model fit to correct for GC bias

Page 16: Population Approaches to Detecting and Genotyping Copy Number Variation

Loess curves fit to remove residual spatial variation of coverage

Page 17: Population Approaches to Detecting and Genotyping Copy Number Variation

Detecting CNVS with NGS dataDepth/haploid coverage

B-allele frequency

Page 18: Population Approaches to Detecting and Genotyping Copy Number Variation

NGS versus CGH data

NGS data chrom1:350mb-351mb CGH data chrom1:350mb-351mb

Page 19: Population Approaches to Detecting and Genotyping Copy Number Variation

NGS vs CGH data

Page 20: Population Approaches to Detecting and Genotyping Copy Number Variation

Haplotype structure of deletion

Page 21: Population Approaches to Detecting and Genotyping Copy Number Variation

NGS amplification Depth/coverage

Page 22: Population Approaches to Detecting and Genotyping Copy Number Variation

With consistent break-points in population

Page 23: Population Approaches to Detecting and Genotyping Copy Number Variation

Polyploid phasing and imputationIm

puta

tion

erro

r ra

teS

witc

h e

rror

rat

e

Page 24: Population Approaches to Detecting and Genotyping Copy Number Variation

Conclusions

• Population-haplotype model enables joint CNV discovery and genotyping using array data

• Preliminary results indicate this will also help using NGS data

• Combining information from multiple platforms improves sensitivity

• Imputation still works for ploidy > 2, phasing becomes more difficult

Page 25: Population Approaches to Detecting and Genotyping Copy Number Variation

Acknowledgements

Evangelos Bellos

Shu-Yi Su

Robin Walters

Julian Asher

Alex Blakemore

Adam de Smith

Phillipe Froguel

Julia El-Sayed Moustafa

David Balding (UCL)

Rob Sladek (McGill)