Population Approaches to Detecting and Genotyping Copy Number Variation
description
Transcript of Population Approaches to Detecting and Genotyping Copy Number Variation
Population Approaches to Detecting and Genotyping Copy
Number Variation
Lachlan Coin
July 2010
Outline
• Population-haplotype approach to CNV detecting and genotyping
• Application to SNP and CGH data
• Application to NGS sequence data
cnvHap approach to CNV discovery and genotyping
Coin et al, 2010, Nature Methods 7, 541 - 546 (2010)
Example of trained model
cnvHap models haploid CN transitions
• Specify an per-base global transition rate matrix
copy number to
copy
num
ber
from 0
1234
0 1 2 3 4
q00 q10 ….
…
• Rate matrix multiplied by position specific scalar rate• Values trained using EM, following the approach of
Klosterman et al, used in Xrate for finding substitution rates
cnvHap joint model of CNV + SNP haplotypes
Cluster positions modelled using a linear model
1))((*0.5))((*)(=)(
))((1*)(=)(
)(=)(
)/2))((log(=)(
)/2)((log=)(
1=)(
*=
)(
)(
)(
)(
5
4
3
22
1
0
2
2
ggggf
gggf
ggf
ggf
ggf
gf
g
g
g
g
bm
bm
rm
rm
bfracbfracbfrac
bfracbfrac
bfrac
CN
CN
β
Model fitted using Ridge regression carried at each iteration of E-M algorithm
Using Illumina SNP arrays
Illumina Agilent Illumina Agilent Illumina Agilent
Combined Illumina and Agilent arrays
Some CNVs exhibit shared structure
Improved CNV genotyping accuracy
Cumulative Frequency of Squared Pearson Correlation
A deletion at 16p11.2 in a patient with ‘extreme obesity’
• estimated by aCGH to be 546kb-700kb• flanked by segmental duplication (>99% sequence identity)• probably arises by NAHR, implying deletion is 739kb
• BMI = 29.2 kg.m-2 at age 7½• learning difficulties, delayed speech
28.9 Mb 29.2 Mb 29.5 Mb 29.8 Mb 30.1 Mb 30.4 Mb 30.7 Mb
p13.2
p13.1
2
p12.3
p12.1
q12.2
q21
q22.2
q23.1
q23.3
q24.2
p11.2
log2
ratio
+1
0
- 1
- 2
- 3
MLPA probes
Segmental duplication
chromosome 16
RG Walters et al. Nature 463, 671-675 (2010) doi:10.1038/nature08727
16p11.2 deletions in obesity and population cohorts
-3/931British extreme early-onset obesity (SCOOP)
0/5304/643French child obesity case:control
Lean/Normal Weight
ObeseCohort
0/6694/705French adult obesity case:control
1/62353/1592Population cohorts(NFBC1966, CoLaus, EGPUT)
0/1402/159Swedish discordant siblings
-2/141French bariatric surgery patients
Obesity: P = 5.8x10-7 OR = 29.8 [3.9–225]Morbid obesity: P = 6.4x10-8 OR = 43.0 [5.6–329]
Coverage affected by GC content
Regression model fit to correct for GC bias
Loess curves fit to remove residual spatial variation of coverage
Detecting CNVS with NGS dataDepth/haploid coverage
B-allele frequency
NGS versus CGH data
NGS data chrom1:350mb-351mb CGH data chrom1:350mb-351mb
NGS vs CGH data
Haplotype structure of deletion
NGS amplification Depth/coverage
With consistent break-points in population
Polyploid phasing and imputationIm
puta
tion
erro
r ra
teS
witc
h e
rror
rat
e
Conclusions
• Population-haplotype model enables joint CNV discovery and genotyping using array data
• Preliminary results indicate this will also help using NGS data
• Combining information from multiple platforms improves sensitivity
• Imputation still works for ploidy > 2, phasing becomes more difficult
Acknowledgements
Evangelos Bellos
Shu-Yi Su
Robin Walters
Julian Asher
Alex Blakemore
Adam de Smith
Phillipe Froguel
Julia El-Sayed Moustafa
David Balding (UCL)
Rob Sladek (McGill)