Post on 28-Jul-2015
James Willey, MD George Isaac Professor for Cancer Research University of Toledo Health Sciences Campus
(Conflict: Equity in Accugenomics, Inc. which has interest in data presented)
ERCC Synthetic Spike In Standards To Validate Competitive Multiplex PCR Amplicon Library
Method For Targeted NGS Analysis
Outline n Rationale for Developing Targeted NGS Method that
Employs Synthetic cDNA Internal Standards
n Use of ERCC Reference Materials in Method Validation Ø Linear dynamic range Ø Signal-to-analyte response Ø Precision Ø Accuracy Ø Reproducibility with other platforms
n Contribution of target analyte (e.g. cDNA) copies loaded and sequencing counts to stochastic sampling variation
n Evidence that method markedly reduces sequencing counts required
n Application of targeted NGS method with synthetic DNA cDNA internal standards in a lung cancer risk test (LCRT)
n Use of ERCC reference materials in reverse transcription efficiency testing
Rationale for Method Development n Targeted NGS currently is used in cancer diagnostics
Ø e.g. Bait capture of >3000 exons followed by NGS (Foundation One)
n There is a need for targeted sequencing methods that control for sources of error in library prep: Ø Blomquist et al Plos One 2013
Ø “Target enrichment steps, including bait hybridization-, capture and ligation-, or PCR-based strategies may be associated with inter-library variation, in part due to under- or over-loading, signal saturation and compression”
Ø “A targeted method that reduces over-sequencing of highly expressed relative to lowly expressed transcripts is needed to be cost-effective”
Ø Fu et al PNAS 2014: Ø “Standard library preparation methods result in the loss of rare transcripts
and highlights the need for monitoring library efficiency and for developing more efficient sample preparation methods”
– For every 1,000 copies of a transcript in the starting sample, only 1-6 copies remained in the sequencing library after bait capture.
Ø “A more straightforward way to measure library preparation efficiency is to add a known number of barcoded RNA molecules into the sample and determine how many make it through the library preparation steps.”
PCR Amplicon Library Preparation for DNA Analysis Example: Ion AmpliSeq™ Target Selection Technology (LifeTech)
• Targeted Amplicon library prep for DNA analysis • PCR amplify each cDNA with mixture of primers • Second round of amplification with bar-code primers • Excellent for targeted analysis of DNA variants
• Reduced complexity, read count requirement, and cost
Targeted PCR Amplicon Library Prep for RNASeq is more complicated • Genes expressed over wide range so multiplex PCR of many gene targets requires
• Limitation of primer concentration to ensure amplication of lowly expressed targets • Limitation of cycle number to avoid convergence
• Problems • These conditions limit measurement of lowly expressed genes • Deep sequencing (massive oversampling of highly expressed genes) still required
Solutions: • Measure each target relative to known copy number of synthetic cDNA internal standard • Limit primer concentration, use touch-down PCR to optimize low primer condition
• This forces analytes represented over wide range to converge towards equimolar concentration
Targeted Amplicon Library Prep for NGS
Reagents
• Prepare primer mixture: comprising primers for each target (NT) • Prepare synthetic cDNA internal standard (IS) mixture: comprising IS for
each target Multiplex PCR: • Combine cDNA sample, IS mixture, and primer mixture at limiting
concentration • Amplify simultaneously using touchdown PCR:
• multiple cDNA target NTs • known copies of competitive template IS for each respective NT
Advantages
• Controls for variation in PCR efficiency by measurement relative to IS • Each target converges towards equivalence at plateau during multiplex
PCR • Starting relative representation is preserved because each target is
measured relative to known starting IS copy number
Multiplex Competitive RT-PCR Method Description
Mix Native Targets (NT) and Synthetic Internal Standards (IS) in Varying Ratios
Multiplex Competitive PCR Amplicon Library Prep
******
Target-specific forward or reverse priming sites (identical between Native Target and Internal Standard)
Nucleotide substitutions allow for discrimination of Native Target from synthetic cDNA Internal Standard (IS)
Multiplex Competitive PCR Amplicon Library Prep
Mix Native Targets (NT) and Synthetic cDNA Internal Standards (IS) in Varying Ratios
Internal Standards Titrate
R
atio
of S
eque
ncin
g C
ount
s N
T:IS
Internal Standard Concentration (copies per library preparation)
Endogenous cDNA
ERCC controls
gDNA
Titration of Synthetic cDNA Internal Standards Relative to gDNA or cDNA
Fine (half-log) titration to assess analytical performance • NT:IS ratio linear from <1:100 to >100:1 • Measure >10,000-fold expression range with single IS
concentration (e.g., 105 IS copies/library circled)
Standardized RNA Sequencing (STARSEQ) Data Analysis
Empirically Supported Conclusions Ø Synthetic cDNA IS are in a fixed relationship relative to each other. Ø Competition between each NT and its respective IS preserves the
original concentration for each NT Ø >10,000-fold expression range may be measured with single IS concentration
Ø The proportional relationship among native targets in the original sample is preserved during amplification and sequencing
Ø Measurement relative to IS controls for variation in PCR efficiency and downstream library preparation steps
Data Analysis: Calculation of native target abundance in sample Ø Determine ratio of sequencing counts for NT and IS (NT:IS) Ø Multiply NT:IS ratio by internal standard (IS) concentration in amplicon
library prep
ERCC Reference Materials to Assess Accuracy of Targeted NGS
n Excellent signal-to-analyte response (slope close to 1.0) for 26 measured ERCC RM targets
n Correlation coefficient >0.94
Correlation With Known ERCC Concentration
n Inter-assay variation in systematic difference n Multiple possible causes of difference
Ø Inter-assay difference in RT efficiency Ø Inter-assay difference in quantification of synthetic RNA (ERCC) or
cDNA internal standard.
Systematic Differences in ERCC RM Measured/Expected Values
ERCC Reference Materials to Assess Accuracy of Targeted NGS
n Increased correlation after correction for systematic differences in measured values Ø Correlation coefficient >0.99
Correlation With Known ERCC Concentration
Correlation After Correction for Systematic Variation
ERCC Reference Materials to Assess Targeted NGS Fold-Change Measurement Capability
n Excellent fold-change detection consistent with precision and signal response n Known concentration of each ERCC RM served as true values
Ø Sensitivity: Correct/Incorrect classification of two SEQC/ERCC dilutions known to be different.
Ø 1-Specificity: Incorrect/Correct classification of two SEQC/ERCC dilutions known to be not different.
Method Reproducibility
• MAQC Sample A Endogenous
Targets • Inter-day (A) • Inter-Lab (B) • Inter-library (C) • Inter-Lab and library (D)
• MAQC Sample Accuracy (observed relative to expected) • Sample C endogenous
targets (E) • Sample D endogenous
targets (F)
A. B.
C. D.
E. F.
STARSEQ Inter-platform Comparison
ERCC Reference Materials to Assess Effect of Stochastic Sampling on Precision
n Stochastic sampling effect on analytical variation increases below 1000 copies loaded
n Same effect observed with all platforms in MAQC study
NGS Assessment Using Known ERCC Synthetic
RNA Concentrations
MAQC Cross-Platform Comparison Using known
synthetic cDNA concentrations
Hypothesis: The coefficient of variation (CV) for amplicon-based NGS assay measurements may be partly predicted by Poisson (i.e. stochastic) sampling effects for a nucleic acid target at two key points: 1) input molarity (i.e. number of intact molecules)
2) sequencing coverage (i.e. read counts).
Design: Based on a two point stochastic sampling model we derived equations for expected coefficient of variation.
Model 1 -‐ Number of sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54) Model 2 -‐ Number of molecules input dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54) Model 3 -‐ Number of molecules input and sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 ]−0.54)
Design:
n These expectation models were tested against data derived from cross-mixtures of two cell lines, H23 and H520 Ø These lines were tested and known to be homozygous for
opposite alleles at four polymorphic sites: Ø rs769217, rs1042522, rs735482 and rs2298881
n The cell lines were mixed to produce limiting molecule inputs
n Following multiplex competitive PCR the library preparations were diluted to produce limiting sequencing inputs
Model 1 -‐ Number of sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54) Model 2 -‐ Number of molecules input dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54) Model 3 -‐ Number of molecules input and sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 ]−0.54)
(46-quadruplicate measurements; R2 = -0.70)
Measured CV 13-fold Higher than expected
Model 1 -‐ Number of sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54) Model 2 -‐ Number of molecules input dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54) Model 3 -‐ Number of molecules input and sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 ]−0.54)
(46-quadruplicate measurements; R2 = 0.24)
Measured CV 1.5-fold Higher than expected
Model 1 -‐ Number of sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54) Model 2 -‐ Number of molecules input dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54) Model 3 -‐ Number of molecules input and sequence reads dictates assay CV:
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑉 = −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠 𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑅𝑒𝑎𝑑𝑠 ]−0.54)
(46-quadruplicate measurements; R2 = 0.74)
Measured CV 1.01-fold Ratio compared to expected
Effect of Low Molecule Input on Measurement Reliability
ERCC1 - rs735482 Important for Nucleotide Excision Repair and in some studies indicative of oxaliplatin response in ovarian cancer
n When high molecule copy loaded, excellent measured/expected accuracy
n When low molecule copy loaded, no meaningful measured/expected relationship
Summary
n For a variety of reasons many clinical specimens are Ø Small samples Ø Poor quality DNA or RNA and therefore have low fraction of measurable
target molecules. Ø Limited to one technical replicate.
n Based on findings presented here, knowledge of both Ø Nucleic acid Molecule Input amount and Ø Sequencing coverage
is important for
Accurate quantitative reporting in NGS
Targeted NGS Reduction in Over-Sampling Due to Convergence During PCR
Ø (A) Each abundant or rare analyte native template (NT) is measured relative to known copy of respective synthetic cDNA IS
Ø (B) Abundant NT and IS amplify in parallel and plateau at early cycle due to primer limitation. Rare NT and IS have sufficient primer to continue amplifying and therefore converge towards abundant NT and IS
Ø (C) Numerical representation of graph in (B)
ERCC Reference Materials to Assess Reduction in Over-sampling
n There was a reduction in number of sequencing reads required to measure all analytes in library Ø 6.9 x 103 fold reduction in number to measure the ERCC RM targets Ø 1.6 x 104 fold reduction in number to measure endogenous targets
SUMMARY OF PERFORMANCE CHACTERISTICS
n High reproducibility (R2=0.997) Ø 97% accuracy to detect 2-fold change (measured with ERCC)
n High inter-day, inter-site, inter-library concordance (R2>0.97) n High cross platform concordance with:
Ø Taqman qPCR (R2=0.96) Ø Whole transcriptome RNA-sequencing following traditional library
prep with Illumina NGS kits (R2=0.94)
n Convergence during PCR reduces sequencing reads required Ø Quantify >100 targeted transcripts expressed over 107-fold
Ø Whole transcriptome: 2.3 x109 sequencing reads Ø Targeted method: 1.4 x 105 sequencing reads
– More than 10,000 fold reduction
n Reveals stochastic sampling contribution to analytical variation
Rationale for LCRT-AGx
n Recently completed National Lung Screening Trial results Ø NEJM, July 2011 and http://www.cancer.gov/nlst
Ø Reported 20% reduction in mortality resulting from three annual CTs. – This would translate into prevention of 30,000 deaths/year in US alone. – Projections indicate that with 5-6 annual CTs 80,000 deaths/year can be
prevented.
n Annual CT screening is now standard of care Ø based on recommendations of United States Preventive Services
Task Force (USPSTF), American Cancer Society (ACS), American Thoracic Society (ATS), and National Comprehensive Cancer Network (NCCN)
n Yet all consensus groups urged efforts to establish biomarkers that better identify individuals at highest risk Ø sited costs and high false positive rate and associated
complications
n Multi-institutional prospective nested case control study of 14 gene lung cancer risk test.
n Mayo Clinic, University of Michigan, Ohio State University, Henry Ford Hospital, Vanderbilt, Tennessee VA Hospital, University of Toledo, Toledo Hospital, Cleveland Clinic (Pending: University of Colorado/National Jewish Hospital, Fairfax/Inova, Wayne State, others)
n Will be completed in 2014-15. n American Recovery and Re-Investment Act ARRA Funding
n If RC2 CA148572 study validates previous results, will submit to FDA for approval for commercial use as a diagnostic test for lung cancer risk
n Individuals with positive Lung Cancer Risk Test will be candidates for trials that aim to Identify lung cancer at early stage through screening by CAT Scan of Chest
Prospective Multi-Site Validation Trial of LCRT
RC2 CA148572
n Necessary characteristics Ø Higher throughput Ø Less expensive Ø Quality controlled Ø Use less RNA Ø Tolerate lower quality RNA
n Choice Ø Targeted Next-Generation RNAseq
Ø Multiplex competitive PCR amplicon libraries – STARSEQ
Need to Develop Better Gene Expression Platform!
Development of Lung Cancer Risk Test (LCRT) on Standardized RNA Sequencing (STARSEQ) Platform
n STARSEQ method highly correlated with capillary electrophoresis (CE) method used to report LCRT performance in Blomquist et al (Cancer Research, 2009)
ERCC Reference Materials to Optimize RT Efficiency
n An Reverse Transcription Standards Mixture (RTSM) was prepared by mixing known concentration of ERCC171 RNA and ERCC113 cDNA
n Following RT, the ratio of ERCC171/ERCC113 cDNA is used as measure of RT efficiency.
n This controls for inter-sample variation in RT interference, and inter-experimental variation in reagent (e.g. RT enzyme) quality/quantity
Effect of RNA concentration and RT priming method on yield
of cDNA)
Effect of RNA input on RH-primed RT efficiency (ERCC
171/113 cDNA)
Summary n Targeted NGS Method that Employs Synthetic cDNA Internal
Standards Ø Has excellent
Ø Linear dynamic range Ø Signal-to-analyte response Ø Precision Ø Accuracy Ø Reproducibility with other platforms
Ø Markedly reduces sequencing counts required n Direct measurement of copies loaded and sequencing
counts in each diagnostic assay is critical to ensure Ø avoidance of stochastic sampling variation and reliable measurement
n Application of targeted NGS method with synthetic DNA cDNA internal standards Ø Enabled development of a reliable, low cost a lung cancer risk test
(LCRT)
n ERCC reference materials used effectively in testing for reverse transcription efficiency
Acknowledgements
Thomas Blomquist, !M.D./Ph.D.!Erin Crawford, M.S.!Jeff Hammersley, M.D.!Dan Olson, M.D., Ph.D.!Ragheb Assaly, M.D.!Younsook Yoon, M.D.!DA Hernandez!Lauren Stanoszek, B.A.!
University of Toledo
Accugenomics, Inc.!Tom Morrison, Ph.D.!Brad Austermiller, B.S.!Nick Lazaridis, Ph.D.!
Vanderbilt!Pierre Massion, M.D./Ph.D.!!Mayo Clinic!Dave Midthun, M.D.!!Henry Ford Hospital System!Chris Johnson, Ph.D.!Albert Levin, Ph.D.!Paul Kvale, M.D.!Mike Simoff, M.D.!!Ohio State University!Patrick Nana-Sinkam, M.D./Ph.D.!!University of Michigan!Doug Arenberg, M.D./Ph.D.!
MUSC!Gerard Silvestri, M.D./Ph.D.!!Cleveland Clinic!Peter Mazzone, M.D.!!Innova/Fairfax!Steven Nathan!!Toledo Hospital!Ron Wainz, M.D.!!Mercy/St.Vincent’s Hospital!Jim Tita, M.D.!