20140711 2 j_willey_ercc2.0_workshop

35
James Willey, MD George Isaac Professor for Cancer Research University of Toledo Health Sciences Campus (Conflict: Equity in Accugenomics, Inc. which has interest in data presented) ERCC Synthetic Spike In Standards To Validate Competitive Multiplex PCR Amplicon Library Method For Targeted NGS Analysis

Transcript of 20140711 2 j_willey_ercc2.0_workshop

Page 1: 20140711 2 j_willey_ercc2.0_workshop

James Willey, MD George Isaac Professor for Cancer Research University of Toledo Health Sciences Campus

(Conflict: Equity in Accugenomics, Inc. which has interest in data presented)

ERCC Synthetic Spike In Standards To Validate Competitive Multiplex PCR Amplicon Library

Method For Targeted NGS Analysis

Page 2: 20140711 2 j_willey_ercc2.0_workshop

Outline n  Rationale for Developing Targeted NGS Method that

Employs Synthetic cDNA Internal Standards

n  Use of ERCC Reference Materials in Method Validation Ø  Linear dynamic range Ø Signal-to-analyte response Ø Precision Ø Accuracy Ø Reproducibility with other platforms

n  Contribution of target analyte (e.g. cDNA) copies loaded and sequencing counts to stochastic sampling variation

n  Evidence that method markedly reduces sequencing counts required

n  Application of targeted NGS method with synthetic DNA cDNA internal standards in a lung cancer risk test (LCRT)

n  Use of ERCC reference materials in reverse transcription efficiency testing

Page 3: 20140711 2 j_willey_ercc2.0_workshop

Rationale for Method Development n  Targeted NGS currently is used in cancer diagnostics

Ø  e.g. Bait capture of >3000 exons followed by NGS (Foundation One)

n  There is a need for targeted sequencing methods that control for sources of error in library prep: Ø Blomquist et al Plos One 2013

Ø  “Target enrichment steps, including bait hybridization-, capture and ligation-, or PCR-based strategies may be associated with inter-library variation, in part due to under- or over-loading, signal saturation and compression”

Ø  “A targeted method that reduces over-sequencing of highly expressed relative to lowly expressed transcripts is needed to be cost-effective”

Ø  Fu et al PNAS 2014: Ø  “Standard library preparation methods result in the loss of rare transcripts

and highlights the need for monitoring library efficiency and for developing more efficient sample preparation methods”

–  For every 1,000 copies of a transcript in the starting sample, only 1-6 copies remained in the sequencing library after bait capture.

Ø  “A more straightforward way to measure library preparation efficiency is to add a known number of barcoded RNA molecules into the sample and determine how many make it through the library preparation steps.”

Page 4: 20140711 2 j_willey_ercc2.0_workshop

PCR Amplicon Library Preparation for DNA Analysis Example: Ion AmpliSeq™ Target Selection Technology (LifeTech)

•  Targeted Amplicon library prep for DNA analysis •  PCR amplify each cDNA with mixture of primers •  Second round of amplification with bar-code primers •  Excellent for targeted analysis of DNA variants

•  Reduced complexity, read count requirement, and cost

Targeted PCR Amplicon Library Prep for RNASeq is more complicated •  Genes expressed over wide range so multiplex PCR of many gene targets requires

•  Limitation of primer concentration to ensure amplication of lowly expressed targets •  Limitation of cycle number to avoid convergence

•  Problems •  These conditions limit measurement of lowly expressed genes •  Deep sequencing (massive oversampling of highly expressed genes) still required

Solutions: •  Measure each target relative to known copy number of synthetic cDNA internal standard •  Limit primer concentration, use touch-down PCR to optimize low primer condition

•  This forces analytes represented over wide range to converge towards equimolar concentration

Targeted Amplicon Library Prep for NGS

Page 5: 20140711 2 j_willey_ercc2.0_workshop

Reagents

•  Prepare primer mixture: comprising primers for each target (NT) •  Prepare synthetic cDNA internal standard (IS) mixture: comprising IS for

each target Multiplex PCR: •  Combine cDNA sample, IS mixture, and primer mixture at limiting

concentration •  Amplify simultaneously using touchdown PCR:

•  multiple cDNA target NTs •  known copies of competitive template IS for each respective NT

Advantages

•  Controls for variation in PCR efficiency by measurement relative to IS •  Each target converges towards equivalence at plateau during multiplex

PCR •  Starting relative representation is preserved because each target is

measured relative to known starting IS copy number

Multiplex Competitive RT-PCR Method Description

Page 6: 20140711 2 j_willey_ercc2.0_workshop

Mix Native Targets (NT) and Synthetic Internal Standards (IS) in Varying Ratios

Multiplex Competitive PCR Amplicon Library Prep

******

Target-specific forward or reverse priming sites (identical between Native Target and Internal Standard)

Nucleotide substitutions allow for discrimination of Native Target from synthetic cDNA Internal Standard (IS)

Page 7: 20140711 2 j_willey_ercc2.0_workshop

Multiplex Competitive PCR Amplicon Library Prep

Mix Native Targets (NT) and Synthetic cDNA Internal Standards (IS) in Varying Ratios

Page 8: 20140711 2 j_willey_ercc2.0_workshop

Internal Standards Titrate

R

atio

of S

eque

ncin

g C

ount

s N

T:IS

Internal Standard Concentration (copies per library preparation)

Endogenous cDNA

ERCC controls

gDNA

Titration of Synthetic cDNA Internal Standards Relative to gDNA or cDNA

Fine (half-log) titration to assess analytical performance •  NT:IS ratio linear from <1:100 to >100:1 •  Measure >10,000-fold expression range with single IS

concentration (e.g., 105 IS copies/library circled)

Page 9: 20140711 2 j_willey_ercc2.0_workshop

Standardized RNA Sequencing (STARSEQ) Data Analysis

Empirically Supported Conclusions Ø Synthetic cDNA IS are in a fixed relationship relative to each other. Ø Competition between each NT and its respective IS preserves the

original concentration for each NT Ø  >10,000-fold expression range may be measured with single IS concentration

Ø  The proportional relationship among native targets in the original sample is preserved during amplification and sequencing

Ø Measurement relative to IS controls for variation in PCR efficiency and downstream library preparation steps

Data Analysis: Calculation of native target abundance in sample Ø Determine ratio of sequencing counts for NT and IS (NT:IS) Ø Multiply NT:IS ratio by internal standard (IS) concentration in amplicon

library prep

Page 10: 20140711 2 j_willey_ercc2.0_workshop

ERCC Reference Materials to Assess Accuracy of Targeted NGS

n  Excellent signal-to-analyte response (slope close to 1.0) for 26 measured ERCC RM targets

n  Correlation coefficient >0.94

Correlation With Known ERCC Concentration

Page 11: 20140711 2 j_willey_ercc2.0_workshop

n  Inter-assay variation in systematic difference n  Multiple possible causes of difference

Ø  Inter-assay difference in RT efficiency Ø  Inter-assay difference in quantification of synthetic RNA (ERCC) or

cDNA internal standard.

Systematic Differences in ERCC RM Measured/Expected Values

Page 12: 20140711 2 j_willey_ercc2.0_workshop

ERCC Reference Materials to Assess Accuracy of Targeted NGS

n  Increased correlation after correction for systematic differences in measured values Ø Correlation coefficient >0.99

Correlation With Known ERCC Concentration

Correlation After Correction for Systematic Variation

Page 13: 20140711 2 j_willey_ercc2.0_workshop

ERCC Reference Materials to Assess Targeted NGS Fold-Change Measurement Capability

n  Excellent fold-change detection consistent with precision and signal response n  Known concentration of each ERCC RM served as true values

Ø  Sensitivity: Correct/Incorrect classification of two SEQC/ERCC dilutions known to be different.

Ø  1-Specificity: Incorrect/Correct classification of two SEQC/ERCC dilutions known to be not different.

Page 14: 20140711 2 j_willey_ercc2.0_workshop

Method Reproducibility

•  MAQC Sample A Endogenous

Targets •  Inter-day (A) •  Inter-Lab (B) •  Inter-library (C) •  Inter-Lab and library (D)

•  MAQC Sample Accuracy (observed relative to expected) •  Sample C endogenous

targets (E) •  Sample D endogenous

targets (F)

A. B.

C. D.

E. F.

Page 15: 20140711 2 j_willey_ercc2.0_workshop

STARSEQ Inter-platform Comparison

Page 16: 20140711 2 j_willey_ercc2.0_workshop

ERCC Reference Materials to Assess Effect of Stochastic Sampling on Precision

n  Stochastic sampling effect on analytical variation increases below 1000 copies loaded

n  Same effect observed with all platforms in MAQC study

NGS Assessment Using Known ERCC Synthetic

RNA Concentrations

MAQC Cross-Platform Comparison Using known

synthetic cDNA concentrations

Page 17: 20140711 2 j_willey_ercc2.0_workshop

Hypothesis: The coefficient of variation (CV) for amplicon-based NGS assay measurements may be partly predicted by Poisson (i.e. stochastic) sampling effects for a nucleic acid target at two key points: 1)  input molarity (i.e. number of intact molecules)

2) sequencing coverage (i.e. read counts).

Page 18: 20140711 2 j_willey_ercc2.0_workshop

Design: Based on a two point stochastic sampling model we derived equations for expected coefficient of variation.

Model  1  -­‐  Number  of  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54)  Model  2  -­‐  Number  of  molecules  input  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54)  Model  3  -­‐  Number  of  molecules  input  and  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 ]−0.54)  

Page 19: 20140711 2 j_willey_ercc2.0_workshop

Design:

n These expectation models were tested against data derived from cross-mixtures of two cell lines, H23 and H520 Ø These lines were tested and known to be homozygous for

opposite alleles at four polymorphic sites: Ø  rs769217, rs1042522, rs735482 and rs2298881

n The cell lines were mixed to produce limiting molecule inputs

n Following multiplex competitive PCR the library preparations were diluted to produce limiting sequencing inputs

Page 20: 20140711 2 j_willey_ercc2.0_workshop

Model  1  -­‐  Number  of  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54)  Model  2  -­‐  Number  of  molecules  input  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54)  Model  3  -­‐  Number  of  molecules  input  and  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 ]−0.54)  

(46-quadruplicate measurements; R2 = -0.70)

Measured CV 13-fold Higher than expected

Page 21: 20140711 2 j_willey_ercc2.0_workshop

Model  1  -­‐  Number  of  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54)  Model  2  -­‐  Number  of  molecules  input  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54)  Model  3  -­‐  Number  of  molecules  input  and  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 ]−0.54)  

(46-quadruplicate measurements; R2 = 0.24)

Measured CV 1.5-fold Higher than expected

Page 22: 20140711 2 j_willey_ercc2.0_workshop

Model  1  -­‐  Number  of  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54)  Model  2  -­‐  Number  of  molecules  input  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54)  Model  3  -­‐  Number  of  molecules  input  and  sequence  reads  dictates  assay  CV:  

𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑  𝐶𝑉 =  −1 + 10(𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 −0.54+𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 −0.54−[𝑀𝑜𝑙𝑒𝑐𝑢𝑙𝑒𝑠  𝐼𝑛𝑝𝑢𝑡 ×𝑆𝑒𝑞𝑢𝑒𝑛𝑐𝑒  𝑅𝑒𝑎𝑑𝑠 ]−0.54)  

(46-quadruplicate measurements; R2 = 0.74)

Measured CV 1.01-fold Ratio compared to expected

Page 23: 20140711 2 j_willey_ercc2.0_workshop

Effect of Low Molecule Input on Measurement Reliability

ERCC1 - rs735482 Important for Nucleotide Excision Repair and in some studies indicative of oxaliplatin response in ovarian cancer

n  When high molecule copy loaded, excellent measured/expected accuracy

n  When low molecule copy loaded, no meaningful measured/expected relationship

Page 24: 20140711 2 j_willey_ercc2.0_workshop

Summary

n  For a variety of reasons many clinical specimens are Ø  Small samples Ø  Poor quality DNA or RNA and therefore have low fraction of measurable

target molecules. Ø  Limited to one technical replicate.

n  Based on findings presented here, knowledge of both Ø  Nucleic acid Molecule Input amount and Ø  Sequencing coverage

is important for

Accurate quantitative reporting in NGS

Page 25: 20140711 2 j_willey_ercc2.0_workshop

Targeted NGS Reduction in Over-Sampling Due to Convergence During PCR

Ø  (A) Each abundant or rare analyte native template (NT) is measured relative to known copy of respective synthetic cDNA IS

Ø  (B) Abundant NT and IS amplify in parallel and plateau at early cycle due to primer limitation. Rare NT and IS have sufficient primer to continue amplifying and therefore converge towards abundant NT and IS

Ø  (C) Numerical representation of graph in (B)

Page 26: 20140711 2 j_willey_ercc2.0_workshop

ERCC Reference Materials to Assess Reduction in Over-sampling

n  There was a reduction in number of sequencing reads required to measure all analytes in library Ø  6.9 x 103 fold reduction in number to measure the ERCC RM targets Ø  1.6 x 104 fold reduction in number to measure endogenous targets

Page 27: 20140711 2 j_willey_ercc2.0_workshop

SUMMARY OF PERFORMANCE CHACTERISTICS

n  High reproducibility (R2=0.997) Ø  97% accuracy to detect 2-fold change (measured with ERCC)

n  High inter-day, inter-site, inter-library concordance (R2>0.97) n  High cross platform concordance with:

Ø  Taqman qPCR (R2=0.96) Ø Whole transcriptome RNA-sequencing following traditional library

prep with Illumina NGS kits (R2=0.94)

n  Convergence during PCR reduces sequencing reads required Ø Quantify >100 targeted transcripts expressed over 107-fold

Ø  Whole transcriptome: 2.3 x109 sequencing reads Ø  Targeted method: 1.4 x 105 sequencing reads

– More than 10,000 fold reduction

n  Reveals stochastic sampling contribution to analytical variation

Page 28: 20140711 2 j_willey_ercc2.0_workshop

Rationale for LCRT-AGx

n  Recently completed National Lung Screening Trial results Ø NEJM, July 2011 and http://www.cancer.gov/nlst

Ø  Reported 20% reduction in mortality resulting from three annual CTs. –  This would translate into prevention of 30,000 deaths/year in US alone. –  Projections indicate that with 5-6 annual CTs 80,000 deaths/year can be

prevented.

n  Annual CT screening is now standard of care Ø  based on recommendations of United States Preventive Services

Task Force (USPSTF), American Cancer Society (ACS), American Thoracic Society (ATS), and National Comprehensive Cancer Network (NCCN)

n  Yet all consensus groups urged efforts to establish biomarkers that better identify individuals at highest risk Ø  sited costs and high false positive rate and associated

complications

Page 29: 20140711 2 j_willey_ercc2.0_workshop
Page 30: 20140711 2 j_willey_ercc2.0_workshop

n  Multi-institutional prospective nested case control study of 14 gene lung cancer risk test.

n  Mayo Clinic, University of Michigan, Ohio State University, Henry Ford Hospital, Vanderbilt, Tennessee VA Hospital, University of Toledo, Toledo Hospital, Cleveland Clinic (Pending: University of Colorado/National Jewish Hospital, Fairfax/Inova, Wayne State, others)

n  Will be completed in 2014-15. n  American Recovery and Re-Investment Act ARRA Funding

n  If RC2 CA148572 study validates previous results, will submit to FDA for approval for commercial use as a diagnostic test for lung cancer risk

n  Individuals with positive Lung Cancer Risk Test will be candidates for trials that aim to Identify lung cancer at early stage through screening by CAT Scan of Chest

Prospective Multi-Site Validation Trial of LCRT

RC2 CA148572

Page 31: 20140711 2 j_willey_ercc2.0_workshop

n  Necessary characteristics Ø Higher throughput Ø  Less expensive Ø Quality controlled Ø Use less RNA Ø  Tolerate lower quality RNA

n  Choice Ø  Targeted Next-Generation RNAseq

Ø  Multiplex competitive PCR amplicon libraries –  STARSEQ

Need to Develop Better Gene Expression Platform!

Page 32: 20140711 2 j_willey_ercc2.0_workshop

Development of Lung Cancer Risk Test (LCRT) on Standardized RNA Sequencing (STARSEQ) Platform

n  STARSEQ method highly correlated with capillary electrophoresis (CE) method used to report LCRT performance in Blomquist et al (Cancer Research, 2009)

Page 33: 20140711 2 j_willey_ercc2.0_workshop

ERCC Reference Materials to Optimize RT Efficiency

n  An Reverse Transcription Standards Mixture (RTSM) was prepared by mixing known concentration of ERCC171 RNA and ERCC113 cDNA

n  Following RT, the ratio of ERCC171/ERCC113 cDNA is used as measure of RT efficiency.

n  This controls for inter-sample variation in RT interference, and inter-experimental variation in reagent (e.g. RT enzyme) quality/quantity

Effect of RNA concentration and RT priming method on yield

of cDNA)

Effect of RNA input on RH-primed RT efficiency (ERCC

171/113 cDNA)

Page 34: 20140711 2 j_willey_ercc2.0_workshop

Summary n  Targeted NGS Method that Employs Synthetic cDNA Internal

Standards Ø Has excellent

Ø  Linear dynamic range Ø  Signal-to-analyte response Ø  Precision Ø  Accuracy Ø  Reproducibility with other platforms

Ø Markedly reduces sequencing counts required n  Direct measurement of copies loaded and sequencing

counts in each diagnostic assay is critical to ensure Ø  avoidance of stochastic sampling variation and reliable measurement

n  Application of targeted NGS method with synthetic DNA cDNA internal standards Ø Enabled development of a reliable, low cost a lung cancer risk test

(LCRT)

n  ERCC reference materials used effectively in testing for reverse transcription efficiency

Page 35: 20140711 2 j_willey_ercc2.0_workshop

Acknowledgements

Thomas Blomquist, !M.D./Ph.D.!Erin Crawford, M.S.!Jeff Hammersley, M.D.!Dan Olson, M.D., Ph.D.!Ragheb Assaly, M.D.!Younsook Yoon, M.D.!DA Hernandez!Lauren Stanoszek, B.A.!

University of Toledo

Accugenomics, Inc.!Tom Morrison, Ph.D.!Brad Austermiller, B.S.!Nick Lazaridis, Ph.D.!

Vanderbilt!Pierre Massion, M.D./Ph.D.!!Mayo Clinic!Dave Midthun, M.D.!!Henry Ford Hospital System!Chris Johnson, Ph.D.!Albert Levin, Ph.D.!Paul Kvale, M.D.!Mike Simoff, M.D.!!Ohio State University!Patrick Nana-Sinkam, M.D./Ph.D.!!University of Michigan!Doug Arenberg, M.D./Ph.D.!

MUSC!Gerard Silvestri, M.D./Ph.D.!!Cleveland Clinic!Peter Mazzone, M.D.!!Innova/Fairfax!Steven Nathan!!Toledo Hospital!Ron Wainz, M.D.!!Mercy/St.Vincent’s Hospital!Jim Tita, M.D.!