Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples

1
Adam N. Harris, Marcela Carvallo, David Mandelman, and Mark Andersen. Thermo Fisher Scientific, 5791 Van Allen Way, Carlsbad, CA, 92008, USA. FIGURE 3. CNV CALLS FROM ION REPORTER™ SOFTWARE ABSTRACT Aneuploidy in embryos is the leading cause of failure for in vitro fertilization (IVF) procedures. 1 Pre-implantation Genetic Screening (PGS) is used to identify euploid embryos for implantation to increase successful pregnancies and decrease the number of cycles required to obtain them. PGS using fluorescence in situ hybridization has fallen out of favor and been replaced by comparative genome hybridization on microarrays. 1 More recently, high throughput sequencing (HTS) technologies have been employed for cost-effective PGS on multiple samples. 1 We report results using a streamlined Whole Genome Amplification (WGA) and library generation method followed by HTS and data analysis to detect aneuploidies in single cells in under 12hr. Using DNA barcodes, we multiplex libraries to reduce the per sample cost. Our analysis platform compares reads per chromosomal region to an informatics control built from a baseline of normal cell samples. Using this method, we show that trisomy of the smallest chromosome (21) can be detected with high sensitivity and specificity. FIGURE 1. ANEUPLOIDY DETECTION WORKFLOW MATERIALS AND METHODS Library Preparation: Whole Genome Amplification was carried out on cells sorted into plates by FACS using the PicoPLEX™ WGA kit (Rubicon Genomics, Inc.) according to the recommended protocol. Four replicates were generated each from single GM02732 cells (trisomy 18), single GM02767 cells (trisomy 21), and 10 cell aliquots of GM01739 cells (normal control). One GM02767 cell failed to yield DNA after WGA, possibly indicative of an empty well from FACS. After AMPure™ XP bead purification (Beckman Coulter, Inc.), 500ng of material from each successful WGA reaction was enzymatically fragmented and barcode adapter ligated using the Ion Xpress™ Library Preparation Kit. No additional amplification was performed. AMPure™ XP beads were used to obtain a broad range of fragment sizes. Libraries were quantitated by qPCR and normalized. Template preparation: Isothermal Amplification was carried out using 95M total pooled library molecules, a reconstituted enzyme pellet, primer mix, Ion Sphere™ Particles (ISPs), and Start Solution in a 1200ul reaction incubated for 25min at 40ºC. The ISPs were washed, and then template-positive ISPs were enriched using the Ion OneTouch™ ES instrument. Sequencing: Semiconductor sequencing was performed for 250 flows on an Ion 318™ chip with the Ion PGM™ Hi-Q™ Sequencing Kit. Data Analysis: Signal processing, basecalling, and alignment to hg19 were performed using Torrent Suite™ Software v4.5. An extra 30bp were trimmed from read 5’ ends to remove common and random primers from the original WGA fragment ends. BAM files were randomly subsampled to the indicated number of reads and then uploaded to a local Ion Reporter™ Server (v4.4). A baseline was generated using four 10-cell GM01379 control samples which underwent the same treatment regimen as the experimentals. Another sequencing run with the same library pool was used to generate a second baseline. Finally, a third baseline was created using all 8 control samples from both chips. The default Low-Pass Whole-Genome Aneuploidy workflow was used to call CNVs; all CNVs with confidence higher than 0.01 are shown. CONCLUSIONS •We demonstrate a workflow that goes from single cells to CNV calls in under 12hr. The workflow is streamlined by using enzymatic fragmentation, Isothermal Amplification for ISP templating, and 250 flows for sequencing on an Ion 318™ chip. •We analyzed 11 single-cell samples on a single Ion 318™ chip and obtained an average of 450,000 reads per sample. Our analysis shows that T21 can be called with as few as 30,000 reads per sample and should be highly robust at ~150,000 reads. • Ion Reporter™ Software allows creation of an informatics baseline that can use control samples from a different sequencing run. • An apparent false positive was detected but with clearly lower confidence than the true positives, which should allow it to be filtered out. However, because we are working with cell lines, we cannot rule out the possibility that this 18Mbp deletion actually existed in the cell analyzed. • A similar study used emulsion PCR instead of isothermal amplification and demonstrated improved outcomes for in vitro fertilization. 2 REFERENCES 1 Stern (2014). J. Clin. Med. 3:280-309. 2 Łukaszuk et al. (2015). Fertil. Steril., Jan 23 Epub ahead of print. TRADEMARKS For Research Use Only. Not for use in diagnostic procedures. Ion Torrent is a Thermo Fisher Scientific brand. © 2014 Thermo Fisher Scientific, Inc. All rights reserved. AMPure is a trademark of Beckman Coulter, Inc. PicoPLEX™ is a trademark of Rubicon Genomics, Inc. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. Thermo Fisher Scientific • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com For Research Use Only. Not for use in diagnostic procedures. ©2013 Life Technologies Corporation. All rights reserved. The trademarks mentioned herein are the property of Life Technologies Corporation and/or its affiliate(s) or their respective owners. FIGURE 2. SEQUENCING METRICS Over 5 million reads were produced in total on an Ion 318™ chip. The read length distribution was for the most part dictated by the number of flows (250), which limits the read length to ~107bp after removal of adapter and primer sequences (see Materials & Methods). The flow number was optimized for a rapid run and analysis. An average of 450,000 mapped reads were obtained per sample. RESULTS A) A table of CNV calls from ~350,000 subsampled reads from each of four single male trisomy 18 (T18) cells and three single female trisomy 21 (T21) cells compared to a baseline of four control samples (GM01379) run on the same chip. There were no false negatives. A single false positive (highlighted red) was found in one cell but with a much lower confidence score than the expected chr18 CNV. B) Representations of coverage and CNVs from IGV launched from Ion Reporter™ Software. Each set of traces represents results from one cell. The top track shows normalized coverage. The middle track places average lines on top of that coverage. The bottom track shows identified CNVs. The false positive in the last GM02732 sample is indicated by a red oval. FIGURE 4. CNV CONFIDENCE AND COVERAGE FIGURE 5. INFORMATICS BASELINES To avoid running control samples in every experiment, Ion Reporter™ Software allows creation of an informatics baseline which merges data from multiple control samples across different runs. Noise is smoothed and systematic effects are taken into account when determining CNV calls. Here, we compared confidence scores for each CNV sample at 150,000 subsampled reads for three different baselines. The first baseline was from the 4 control samples run on the same chip as the experimentals (also used in Figs 3 & 4). The second baseline was created from the same libraries sequenced on a separate Ion 318™ chip. The final baseline combined data from both runs. There was little change to confidence scores across the 3 baselines. Accurate calls for T18 and T21 could be made using a baseline generated from a different sequencing run. The chr4 false positive call remained. FIGURE 6. KARYOTYPE VIEW (T21 SAMPLE) Cell line Subsample Chr Ploidy Start End Size Confidence Precision GM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 31.1 11.8 GM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 27.9 14.5 GM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 32.3 11.3 GM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 31.4 11.6 GM02732 350,000 chr4 1 74,737,301 90,793,436 16,056,135 1.6 1.6 GM02767 350,000 chr21 3 10,000 48,119,895 48,109,895 14.0 6.0 GM02767 350,000 chr21 3 10,000 48,119,895 48,109,895 12.4 7.3 GM02767 350,000 chr21 3 10,000 48,119,895 48,109,895 11.0 8.2 GM02767 (T21) GM02732 (T18) B. A. Confidence scores from four T18 samples (blue diamonds) and three T21 samples (red squares) as a function of the number of total reads used in Ion Reporter™ Software analysis. Data from the single T18 sample with a false positive deletion on chr4 are also shown (green triangles), although these disappear under 150,000 reads and never result in a confidence higher than 2. Correct calls were made for all samples down to 30,000 reads, below which additional false positives were introduced. At 150,000 reads per sample we would expect to support at least 24 samples per Ion 318™ chip. 0 5 10 15 20 25 30 35 40 0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000 Confidence Score Number of Subsampled Reads Effect of Read Depth on Call Confidence T18 T21 chr4 18Mbp FP 0 5 10 15 20 25 chr18 chr18 chr18 chr18 chr21 chr21 chr21 chr4 Confidence Score CNV Location Alternative Baselines Same Run Different Run Both Runs

Transcript of Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples

Page 1: Rapid Detection of Aneuploidy from Multiplexed Single Cell Samples

Adam N. Harris, Marcela Carvallo, David Mandelman, and Mark Andersen. Thermo Fisher Scientific, 5791 Van Allen Way, Carlsbad, CA, 92008, USA.

FIGURE 3. CNV CALLS FROM ION REPORTER™ SOFTWAREABSTRACT

Aneuploidy in embryos is the leading cause of failure for in vitro fertilization (IVF)procedures.1 Pre-implantation Genetic Screening (PGS) is used to identify euploidembryos for implantation to increase successful pregnancies and decrease the numberof cycles required to obtain them. PGS using fluorescence in situ hybridization hasfallen out of favor and been replaced by comparative genome hybridization onmicroarrays.1 More recently, high throughput sequencing (HTS) technologies have beenemployed for cost-effective PGS on multiple samples.1

We report results using a streamlined Whole Genome Amplification (WGA) and librarygeneration method followed by HTS and data analysis to detect aneuploidies in singlecells in under 12hr. Using DNA barcodes, we multiplex libraries to reduce the persample cost. Our analysis platform compares reads per chromosomal region to aninformatics control built from a baseline of normal cell samples. Using this method, weshow that trisomy of the smallest chromosome (21) can be detected with high sensitivityand specificity.

FIGURE 1. ANEUPLOIDY DETECTION WORKFLOW

MATERIALS AND METHODS

Library Preparation: Whole Genome Amplification was carried out on cells sorted intoplates by FACS using the PicoPLEX™ WGA kit (Rubicon Genomics, Inc.) according tothe recommended protocol. Four replicates were generated each from single GM02732cells (trisomy 18), single GM02767 cells (trisomy 21), and 10 cell aliquots of GM01739cells (normal control). One GM02767 cell failed to yield DNA after WGA, possiblyindicative of an empty well from FACS. After AMPure™ XP bead purification (BeckmanCoulter, Inc.), 500ng of material from each successful WGA reaction was enzymaticallyfragmented and barcode adapter ligated using the Ion Xpress™ Library Preparation Kit.No additional amplification was performed. AMPure™ XP beads were used to obtain abroad range of fragment sizes. Libraries were quantitated by qPCR and normalized.Template preparation: Isothermal Amplification was carried out using 95M total pooledlibrary molecules, a reconstituted enzyme pellet, primer mix, Ion Sphere™ Particles(ISPs), and Start Solution in a 1200ul reaction incubated for 25min at 40ºC. The ISPswere washed, and then template-positive ISPs were enriched using the IonOneTouch™ ES instrument.Sequencing: Semiconductor sequencing was performed for 250 flows on an Ion 318™chip with the Ion PGM™ Hi-Q™ Sequencing Kit.Data Analysis: Signal processing, basecalling, and alignment to hg19 were performedusing Torrent Suite™ Software v4.5. An extra 30bp were trimmed from read 5’ ends toremove common and random primers from the original WGA fragment ends. BAM fileswere randomly subsampled to the indicated number of reads and then uploaded to alocal Ion Reporter™ Server (v4.4). A baseline was generated using four 10-cellGM01379 control samples which underwent the same treatment regimen as theexperimentals. Another sequencing run with the same library pool was used to generatea second baseline. Finally, a third baseline was created using all 8 control samples fromboth chips. The default Low-Pass Whole-Genome Aneuploidy workflow was used to callCNVs; all CNVs with confidence higher than 0.01 are shown.

CONCLUSIONS

•We demonstrate a workflow that goes from single cells to CNV calls in under12hr. The workflow is streamlined by using enzymatic fragmentation, IsothermalAmplification for ISP templating, and 250 flows for sequencing on an Ion 318™chip.

•We analyzed 11 single-cell samples on a single Ion 318™ chip and obtained anaverage of 450,000 reads per sample. Our analysis shows that T21 can becalled with as few as 30,000 reads per sample and should be highly robust at~150,000 reads.

• Ion Reporter™ Software allows creation of an informatics baseline that can usecontrol samples from a different sequencing run.

•An apparent false positive was detected but with clearly lower confidence thanthe true positives, which should allow it to be filtered out. However, because weare working with cell lines, we cannot rule out the possibility that this 18Mbpdeletion actually existed in the cell analyzed.

•A similar study used emulsion PCR instead of isothermal amplification anddemonstrated improved outcomes for in vitro fertilization.2

REFERENCES

1 Stern (2014). J. Clin. Med. 3:280-309.2 Łukaszuk et al. (2015). Fertil. Steril., Jan 23 Epub ahead of print.

TRADEMARKS

For Research Use Only. Not for use in diagnostic procedures.

Ion Torrent is a Thermo Fisher Scientific brand.© 2014 Thermo Fisher Scientific, Inc. All rights reserved. AMPure is a trademarkof Beckman Coulter, Inc. PicoPLEX™ is a trademark of Rubicon Genomics, Inc.All other trademarks are the property of Thermo Fisher Scientific and itssubsidiaries.

Thermo Fisher Scientific • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.comFor Research Use Only. Not for use in diagnostic procedures. ©2013 Life Technologies Corporation. All rights reserved. The trademarks mentioned herein are the property of Life Technologies Corporation and/or its affiliate(s) or their respective owners.

FIGURE 2. SEQUENCING METRICS

Over 5 million reads were produced in total on an Ion 318™ chip. The read length distribution was for the most part dictated by the number of flows (250), which limits the read length to ~107bp after removal of adapter and primer sequences (see Materials & Methods). The flow number was optimized for a rapid run and analysis. An average of 450,000 mapped reads were obtained per sample.

RESULTS

A) A table of CNV calls from ~350,000 subsampled reads from each of foursingle male trisomy 18 (T18) cells and three single female trisomy 21 (T21)cells compared to a baseline of four control samples (GM01379) run on thesame chip. There were no false negatives. A single false positive (highlightedred) was found in one cell but with a much lower confidence score than theexpected chr18 CNV. B) Representations of coverage and CNVs from IGVlaunched from Ion Reporter™ Software. Each set of traces represents resultsfrom one cell. The top track shows normalized coverage. The middle trackplaces average lines on top of that coverage. The bottom track showsidentified CNVs. The false positive in the last GM02732 sample is indicated bya red oval.

FIGURE 4. CNV CONFIDENCE AND COVERAGE

FIGURE 5. INFORMATICS BASELINES

To avoid running control samples in every experiment, Ion Reporter™ Softwareallows creation of an informatics baseline which merges data from multiple controlsamples across different runs. Noise is smoothed and systematic effects are takeninto account when determining CNV calls. Here, we compared confidence scores foreach CNV sample at 150,000 subsampled reads for three different baselines. Thefirst baseline was from the 4 control samples run on the same chip as theexperimentals (also used in Figs 3 & 4). The second baseline was created from thesame libraries sequenced on a separate Ion 318™ chip. The final baseline combineddata from both runs. There was little change to confidence scores across the 3baselines. Accurate calls for T18 and T21 could be made using a baseline generatedfrom a different sequencing run. The chr4 false positive call remained.

FIGURE 6. KARYOTYPE VIEW (T21 SAMPLE)

Cell line Subsample Chr Ploidy Start End Size Confidence PrecisionGM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 31.1 11.8GM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 27.9 14.5GM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 32.3 11.3GM02732 350,000 chr18 3 10,000 78,067,248 78,057,248 31.4 11.6GM02732 350,000 chr4 1 74,737,301 90,793,436 16,056,135 1.6 1.6GM02767 350,000 chr21 3 10,000 48,119,895 48,109,895 14.0 6.0GM02767 350,000 chr21 3 10,000 48,119,895 48,109,895 12.4 7.3GM02767 350,000 chr21 3 10,000 48,119,895 48,109,895 11.0 8.2

GM02767 (T21)GM02732 (T18)

B.

A.

Confidence scores from four T18 samples (blue diamonds) and three T21 samples (red squares) as a function of the number of total reads used in Ion Reporter™ Software analysis. Data from the single T18 sample with a false positive deletion on chr4 are also shown (green triangles), although these disappear under 150,000 reads and never result in a confidence higher than 2. Correct calls were made for all samples down to 30,000 reads, below which additional false positives were introduced. At 150,000 reads per sample we would expect to support at least 24 samples per Ion 318™ chip.

0

5

10

15

20

25

30

35

40

0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000

Con

fiden

ce S

core

Number of Subsampled Reads

Effect of Read Depth on Call ConfidenceT18 T21 chr4 18Mbp FP

0

5

10

15

20

25

chr18 chr18 chr18 chr18 chr21 chr21 chr21 chr4

Con

fiden

ce S

core

CNV Location

Alternative Baselines

Same RunDifferent RunBoth Runs