Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik...
-
Upload
blaze-randall -
Category
Documents
-
view
230 -
download
0
Transcript of Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik...
![Page 1: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/1.jpg)
Copy-number estimation on the latest generation of high-density
oligonucleotide microarrays
Henrik Bengtsson(work with Terry Speed)
Dept of Statistics, UC Berkeley
January 24, 2008
Postdoctoral Seminars, Mathematical Biosciences Institute, The Ohio State University
![Page 2: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/2.jpg)
Acknowledgments
UC Berkeley:James BullardKasper HansenElizabeth PurdomTerry Speed
WEHI, Melbourne:Mark RobinsonKen Simpson
ISREC, Lausanne:“Asa” Wirapati
John Hopkins:Benilton CarvalhoRafael Irizarry
Affymetrix, California:Ben BolstadSimon CawleyLuis JevonsChuck SugnetJim Veitch
![Page 3: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/3.jpg)
Size = 264 kb, Number of loci = 72
Copy number analysis is about finding "aberrations" in a person's genome.
![Page 4: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/4.jpg)
Single Nucleotide Polymorphisms (SNPs)make us unique
Definition:A sequence variation such that two genomes may differ by a single nucleotide (A, T, C, or G).
Allele A: A...CGTAGCCATCGGTA/GTACTCAATGATAG...
Allele B: G
A person has either genotype AA, AB, or BB at this SNP.
![Page 5: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/5.jpg)
Human Genetic Variation:Breakthrough of the Year 2007 (Science)• 3 billion DNA bases.• First sequenced 2001.
• HapMap: 270 individuals genotyped.3 million known SNPs (places where one base differ from one person to another). Estimate: 15 million SNPs.
• Genomewide association studies takeover (over linkage analysis).
• Copy Number Polymorphism:- 1,000s to millions of bases lost or added.- Estimate: 20% of differences in gene activity are due to copy-number variants; SNPs (genotypes) account for the rest.
• January 22, 2008: The 3-year "1,000 Genomes Project" will sequence 1,000 individuals. This follows the HapMap Project (SNPs).
![Page 6: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/6.jpg)
Objectives of this presentation
• Total copy number estimation/segmentation
• Estimate single-locus CNs well(segmentation methods take it from there)
• All generations of Affymetrix SNP arrays:– SNP chips: 10K, 100K, 500K– SNP & CN chips: 5.0, 6.0
• Small and very large data sets
![Page 7: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/7.jpg)
Available in aroma.affymetrix
“Infinite” number of arrays: 1-1,000sRequirements: 1-2GB RAMArrays: SNP, exon, expression, (tiling).Dynamic HTML reportsImport/export to existing methodsOpen source: RCross platform: Windows, Linux, Mac
![Page 8: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/8.jpg)
Affymetrix chips
![Page 9: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/9.jpg)
Running the assaytake 4-5 working days
1. Start with target gDNA (genomic DNA) or mRNA.
2. Obtain labeled single-stranded target DNA fragments for hybridization to the probes on the chip.
3. After hybridization, washing, and scanning we get a digital image.
4. Image summarized across pixels to probe-level intensities before we begin. Thisis our "raw data".
![Page 10: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/10.jpg)
Restriction enzymes digest the DNA, which is then amplified and hybridized
![Page 11: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/11.jpg)
The Affymetrix GeneChip is a synthesized high-density 25-mer microarray
1.28 cm
6.5 million probes/ chip
1.28 cm
*
5 µm
5 µm
> 1 million identical 25bp sequences
* ***
![Page 12: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/12.jpg)
Target DNA find their way to complementary probes by massive parallel hybridization
![Page 13: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/13.jpg)
DAT File(s)[Image, pixel intensities]
Hybridization+ Scanning
Image analysis
CEL File(s)[Probe Cell Intensity]
CDF [Chip Description File]+
Pre-processing
workable raw data
Segmentation
![Page 14: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/14.jpg)
Affymetrix copy-number & genotyping arrays
![Page 15: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/15.jpg)
Terminology
Target sequence: ...CGTAGCCATCGGTAAGTACTCAATGATAG... |||||||||||||||||||||||||
Perfect match (PM): ATCGGTAGCCATTCATGAGTTACTA
25 nucleotides
* *
* **
PM
Target seq.
* **
other PMs
Other DNA Other DNA Other seq.
![Page 16: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/16.jpg)
Copy-number probes are used to quantifythe amount of DNA at known loci
CN locus: ...CGTAGCCATCGGTAAGTACTCAATGATAG...PM: ATCGGTAGCCATTCATGAGTTACTA
* **
PM = c
CN=1* **
PM = 2¢c
CN=2* **
PM = 3¢c
CN=3
![Page 17: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/17.jpg)
Raw copy numbers- log-ratios relative to a reference
From the preprocessing, we obtain for sample i=1,2,...,I, CN locus j=1,2,...,J:
Observed signals: (i1, i2, ..., iJ)
These are not absolute copy-number levels. In order to interpret these, we compare each of them to a reference "R", i.e. ij / Rj, but even better "raw copy numbers":
Mij = log2 (ij / Rj) = log
2(ij) - log2(Rj)
The reference can be from normal tissue, or from a pool of normal samples.
![Page 18: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/18.jpg)
Copy number regions are found by lining up estimates along the chromosome
Even without a segmentation algorithm,we can easily spot a deletion here.
Example: Log-ratios for one sample on Chromosome 22.
![Page 19: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/19.jpg)
Single Nucleotide Polymorphisms (SNPs)make us unique
Definition:A sequence variation such that two genomes may differ by a single nucleotide (A, T, C, or G).
Allele A: A...CGTAGCCATCGGTA/GTACTCAATGATAG...
Allele B: G
A person has either genotype AA, AB, or BB at this SNP.
![Page 20: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/20.jpg)
Affymetrix probes for a SNP- can be used for genotyping
PMA: ATCGGTAGCCATTCATGAGTTACTAAllele A: ...CGTAGCCATCGGTAAGTACTCAATGATAG...
Allele B: ...CGTAGCCATCGGTAGGTACTCAATGATAG...PMB: ATCGGTAGCCATCCATGAGTTACTA
* **
PMA >> PMB
AA* **
*
* **
PMA << PMB
* **
BB* **
PMA ¼ PMB
AB* **
![Page 21: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/21.jpg)
SNPs can also be used forestimating copy numbers
AA* **
PM = PMA + PMB = 2c
* **
* **
PM = PMA + PMB = 2c
AB* **
*
* **
PM = PMA + PMB = 2c
* **
BB
* **
PM = PMA + PMB = 3c
AAB* **
![Page 22: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/22.jpg)
Combing CN estimates from SNPs and CN probes means higher resolution
SNPs + CN probes
![Page 23: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/23.jpg)
A brief history...
![Page 24: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/24.jpg)
Genome-Wide Human SNP Array 6.0is the state-of-the-art array
• > 906,600 SNPs:– Unbiased selection of 482,000 SNPs:
historical SNPs from the SNP Array 5.0 (== 500K)– Selection of additional 424,000 SNPs:
• Tag SNPs• SNPs from chromosomes X and Y• Mitochondrial SNPs• Recent SNPs added to the dbSNP database• SNPs in recombination hotspots
• > 946,000 copy-number probes:– 202,000 probes targeting 5,677 CNV regions from the Toronto
Database of Genomic Variants. Regions resolve into 3,182 distinct, non-overlapping segments; on average 61 probe sets per region
– 744,000 probes, evenly spaced along the genome
![Page 25: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/25.jpg)
How did we get here?
Data from 2003 on Chr22 (on of the smaller chromosomes)
![Page 26: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/26.jpg)
2003: 10,000 loci x1
![Page 27: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/27.jpg)
2004: 100,000 loci x10
![Page 28: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/28.jpg)
2005: 500,000 loci x50
![Page 29: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/29.jpg)
2006: 900,000 loci x90
![Page 30: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/30.jpg)
2007: 1,800,000 loci x180
![Page 31: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/31.jpg)
Rapid increase in density
6.0
5.0
500K
100K
10K
1.6kb
3.6kb
6.0kb
26kb
4£ further out…
294kb
year
# loci
Distance between loci:
next?
2003 2004 2005 2006 2007
![Page 32: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/32.jpg)
Affymetrix & Illumina are competing- we get more bang for the buck (cup)
10K 100K 500K 5.0 6.0
Released July 2003 April 2004 Sept 2005 Feb 2007 May 2007
# SNPs 10,204 116,204 500,568 500,568 934,946
# CNPs - - - 340,742 946,371
# loci 10,204 116,204 500,568 841,310 1,878,317
Distance 294kb 25.8kb 6.0kb 3.6kb 1.6kb
Price / chip set 65 USD 400 USD 260 USD 175 USD 300 USD
# loci / cup of espresso ($1.35)
116 loci 216 loci 1426 loci 3561 loci 4638 loci
Price source: Affymetrix Pricing Information, http://www.affymetrix.com/, January 2008.
![Page 33: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/33.jpg)
Preprocessing forcopy-number analysis
Copy-number estimation using
Robust Multichip Analysis (CRMA)
![Page 34: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/34.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (or quantile)
Total CN PM = PMA + PMB
Summarization (SNP signals )
log-additivePM only
Post-processing fragment-length
(GC-content)
Raw total CNs R = Reference
Mij = log2(ij /Rj) chip i, probe j
![Page 35: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/35.jpg)
Crosstalk between alleles adds significant artifacts to signals
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Cross-hybridization:
Allele A: TCGGTAAGTACTCAllele B: TCGGTATGTACTC
AA* **
PMA >> PMB
* **
* **
PMA ¼ PMB
AB* ** *
* **
PMA << PMB
* **
BB
![Page 36: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/36.jpg)
AA
BBAB
Crosstalk between alleles is easy to spot
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
offset
+
PMB
PMA
![Page 37: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/37.jpg)
Crosstalk between alleles can be estimated and corrected for
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
PMB
PMA
![Page 38: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/38.jpg)
Before removing crosstalk the arrays differ significantly...
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Crosstalk calibration corrects for differences in distributions too
log2 PM
![Page 39: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/39.jpg)
When removing crosstalk systemdifferences between arrays goes away
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Crosstalk calibration corrects for differences in distributions too
log2 PM
![Page 40: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/40.jpg)
How can a translation and a rescaling make such a big difference?
Four measurements of the same thing:
log2 PM
log2 PM
With different scales:log(b*PM) = log(b)+log(PM)
log2 PM
With different scalesand some offset: log(a+b*PM) = ...
![Page 41: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/41.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
AA* **
PM = PMA + PMB
* **
* **
PM = PMA + PMB
AB* **
*
* **
PM = PMA + PMB
* **
BB
![Page 42: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/42.jpg)
For robustness (against outliers), there are multiple probes per SNP
Genotype AA
PMA
PMB
1 2 3 4 5 6 7
Genotype AB
1 2 3 4 5 6 7
PMA
PMB
Genotype BB
1 2 3 4 5 6 7
PMA
PMB
![Page 43: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/43.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
The log-additive model:
log2(PMijk) = log2ij + log2jk + ijk
sample i, SNP j, probe k.
Fit using robust linear models (rlm)
![Page 44: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/44.jpg)
Probe-level summarization- probe affinity model
For a particular SNP, the total CN signal for sample i=1,2,...,I is: i
Which we observe via K probe signals: (PMi1, PMi2, ..., PMiK)
rescaled by probe affinities: (1, 2, ..., K)
A model for the observed PM signals is then:
PMik = k * i + ik
where ik is noise.
![Page 45: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/45.jpg)
Probe-level summarization- the log-additive model
For one SNP, the model is:
PMik = k * i + ik
Take the logarithm on both sides:
log2(PMik) = log2(k * i + ik)
¼ log2(k * i)+ ik
= log2k + log2i + ik
Sample i=1,2,...,I, and probe k=1,2,...,K.
![Page 46: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/46.jpg)
Probe-level summarization- the log-additive model
With multiple arrays i=1,2,...,I, we can estimate the probe-affinity parameters {k} and therefore also the "chip effects" {i} in the model:
log2(PMik) = log2k + log2i + ik
Conclusion: We have summarized
signals (PMAk,PMBk) for probes k=1,2,...,K
into one signal i per sample.
![Page 47: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/47.jpg)
100K
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Longer fragments ) less amplified by PCR ) weaker SNP signals
![Page 48: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/48.jpg)
500K
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Longer fragments ) less amplified by PCR ) weaker SNP signals
![Page 49: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/49.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Normalize to get samefragment-length effect for all hybridizations
![Page 50: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/50.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Normalize to get samefragment-length effect for all hybridizations
![Page 51: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/51.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
![Page 52: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/52.jpg)
Results(comparing with other methods)
![Page 53: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/53.jpg)
Other methods
CRMA dChip(Li & Wong 2001)
CNAG(Nannya et al 2005)
CNAT v4(Affymetrix 2006)
Preprocessing(probe signals)
allelic crosstalk (quantile)
invariant-set scale quantile
Total CNs PM=PMA+PMB PM=PMA+PMB
MM=MMA+MMB
PM=PMA+PMB =A+B
Summarization (SNP signals )
log-additive(PM-only)
multiplicative(PM-MM)
sum (PM-only)
log-additive (PM-only)
Post-processing fragment-length
(GC-content)
- fragment-length
GC-content
fragment-length
GC-content
Raw total CNs Mij = log2(ij/Rj) Mij = log2(ij/Rj) Mij = log2(ij/Rj) Mij = log2(ij/Rj)
![Page 54: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/54.jpg)
How well can be differentiate between one and two copies?
HapMap (CEU):Mapping250K Nsp data (one half of the "500K")30 males and 29 females (no children; one excl. female)
Chromosome X is known: Males (CN=1) & females (CN=2)5,608 SNPs
Classification rule:
Mij < threshold ) CNij =1, otherwise CNij =2.Number of calls: 595,608 = 330,872
![Page 55: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/55.jpg)
Classification rule for loci on X - use raw CNs to call CN=1 or CN=2
Classification rule:
Mij < threshold ) CNij=1, else CNij=2.
Number of calls per locus (SNP): 59 (one per samples)
Across Chromosome X: 59 5,608 loci = 330,872
CN=1
CN=2
![Page 56: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/56.jpg)
Calling samples for SNP_A-1920774
# males: 30# females: 29
Call rule:If Mi < threshold, a male
Calling a male male:#True-positives: 30 TP rate: 30/30 = 100%
Calling a female male:#False-positive : 5FP rate: 5/29 = 17%
![Page 57: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/57.jpg)
Receiver Operator Characteristic (ROC)
FP rate(incorrectly calling females male)
TP rate
(correctly calling a males male)
increasingthreshold
²
(17%,100%)
![Page 58: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/58.jpg)
Single-SNP comparisonA random SNP
TP rate
(correctly calling a males male)
FP rate(incorrectly calling females male)
![Page 59: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/59.jpg)
Single-SNP comparisonA non-differentiating SNP
TP rate
(correctly calling a males male)
FP rate(incorrectly calling females male)
![Page 60: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/60.jpg)
Performance of an average SNPwith a common threshold
59 individuals £
![Page 61: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/61.jpg)
CRMA & dChip perform betterfor an average SNP (common threshold)
Number of calls:59£5,608 = 330,872
TP rate
(correctly calling a males male)
FP rate(incorrectly calling females male)
0.85
1.00
0.150.00
Zoom in
CRMA
dChip
CNAG
CNAT
![Page 62: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/62.jpg)
"Smoothing"
![Page 63: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/63.jpg)
No averaging (R=1)Averaging two and two (R=2)Averaging three and three (R=3)
Average across SNPsnon-overlapping windows
threshold
A false-positive(or real?!?)
![Page 64: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/64.jpg)
Better detection rate when averaging(with risk of missing short regions)
R=1(no avg.)
R=2
R=3
R=4
![Page 65: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/65.jpg)
CRMA does better than dChip
CRMA
dChip
![Page 66: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/66.jpg)
CRMA does better than dChip
CRMA
dChipControl for FP rate: 1.0%
CRMA dChipR=1 69.6% 63.1%R=2 96.0% 93.8%R=3 98.7% 98.0%R=4 99.8% 99.6%… … …
²
²
²²
²
²
![Page 67: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/67.jpg)
Comparing methods by “resolution”controlling for FP rate
@ FP rate: 1.0%
CRMA
CNAT
²
²² ² ² ² ²
dChipCNAG
![Page 68: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/68.jpg)
Comparisonacross generations
(100K - 500K - 6.0)
![Page 69: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/69.jpg)
We have HapMap data for several generations of platforms
HapMap (CEU):30 males and 29 females (no children; one excl.
female)
Chromosome X is known: Males (CN=1) & females (CN=2)5,608 SNPs
Platforms:100K, 500K, 6.0.
![Page 70: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/70.jpg)
Resolution comparison- at 1.0% FP
(1.8kb, 60.7%)
100K
500K
GWS6
![Page 71: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/71.jpg)
Summary
![Page 72: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/72.jpg)
Conclusions
• It helps to:– Control for allelic crosstalk.– Sum alleles at PM level: PM = PMA + PMB.– Control for fragment-length effects.
• Resolution: 6.0 (SNPs) > 500K > 100K(or lab effects).
• Currently estimates from CN probes are poor. Not unexpected. Better preprocessing might help.
![Page 73: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/73.jpg)
2008: >30,000,000 loci >x3000?
On January 10, 2008:
Dr Stephen Fodor, CEO of Affymetrix, outlined new products:
Affymetrix has been focusing on new chemistry techniques, such as a new higher yield synthesis technique.
The first product that will be launched - around the first half of 2008 - is an ultra-high resolution copy number tool.
"This product will allow us to analyze the genome at around 30 times the resolution of the current state-of-the-art technology in the marketplace," claimed Fodor.
Source: http://www.labtechnologist.com/
![Page 74: Copy-number estimation on the latest generation of high-density oligonucleotide microarrays Henrik Bengtsson (work with Terry Speed) Dept of Statistics,](https://reader035.fdocuments.net/reader035/viewer/2022081504/56649e265503460f94b156f5/html5/thumbnails/74.jpg)
Segmentation algorithms are the bottlenecks- we need fast algorithms/implementation
Some methods Need! (…or better)
Chip type
# loci n O(n2) time / sample
O(n) time / sample
250K 250,000 1£ 1£ 0.5h 1£ 5.5min
500K 500,000 2£ 4£ 2h 2£ 12min
5.0 1,000,000 4£ 16£ 8h 4£ 27min
6.0 2,000,000 8£ 64£ 32h 8£ 1.0h
? 32,000,000 128£
16,384£ 341
days!
128£ 12h