MKT0006-2020-02 Redefining “Gold Standard” Ultra-Sensitive ......(ITD) from 33-300 bp, as well...

1
DuplexSeq Tag N´ N´ N´ N´ N´ N´ N´ N´ N N N N N N N N T 1 2 1 2 α B A T A A A A T T T T T C T C C C C A C G T G A A C T A T Source DNA molecule (Truth) Top and bottom strands are amplified and sequenced. PCR copies are grouped by unique tag and strand. Compare top and bottom strands Duplex consensus eliminates errors C TwinStrand Duplex Sequencing Technology Sequencing Errors Obscure Truth Next-Generation Sequencing (NGS) Single Strand Error-Corrected NGS Duplex Sequencing X X Abstract Redefining “Gold Standard”: Ultra-Sensitive Characterization of Commerical DNA Standards with Duplex Sequencing Jacob Higgins, PhD 1 – Gabriel Pratt, PhD 1 – Charles C. Valentine, MS 1 – Lindsey N Williams, PhD 1 – and Jesse J. Salk, MD, PhD 1,2 TwinStrand Biosciences, Seattle, WA 1 – Division of Medical Oncology, University of Washington, Seattle, WA 2 Conclusions • Duplex Sequencing of cancer driver genes in apheresis DNA from a healthy 18-year old control reveals a background mutation frequency below 1-in-2-million, which is biological, not technical. • Cell lines have a higher background mutation frequency and a greater proportion of clonal (multi-count) background mutations than DNA from a healthy young donor, even when sequencing the donor to 3x greater depth. • For mutation mixes, use of DNA from a young donor as the base results in a lower background mutation than with cell line DNA as the base dilutant. • Duplex Sequencing readily detects single-nucleotide variants (SNVs) as well as large insertions and deletions at ultra low frequencies. • Given sufficient sequencing depth, Duplex Sequencing detects mutations at frequencies from 1/30,000 to 1/100,000 with high sensitivity and specificity. DNA standards are an essential resource in diagnostic and research labs. However, as genomic technologies become more sensitive, the metrics defining what a “gold standard” entails have not been carefully re-evaluated. Cell culture exposes DNA to oxidative damage and clonal selection, and synthetic DNA has a mutation frequency inherently higher than biologically-derived genomic DNA. Here we apply ultra-accurate Duplex Sequencing to characterize the background mutation frequency in two commercial mutation standards, in control genomic DNA from a healthy young donor and in custom mutation mixes using that young donor DNA as a base. We observe that synthetic (highest) and cell line (intermediate) DNA-based mixes exhibit significantly elevated background mutation frequency relative to carefully selected genomic DNA or mutation mixes where genomic DNA is the dilutant into which patient or cell-line derived mutant molecules are spiked. When employing high-sensitivity genomic technologies with the ability to resolve mutations at levels below one-in-one-million, rigorous attention to the techniques and substrates used for manufacturing mutation standards are critical. Contact: https://twinstrandbio.com/contact/ Duplex Sequencing of Custom In-House AML Mutation Mix MKT0006 v1.0 A DuplexSeq Adapter has: 1. 2. Identical (or relatable) degenerate tags in each strand. An asymmetry allowing independent strand identification. In CS-A, DS VAF was highly correlated to vendor ddPCR measurements, with r 2 = 0.98. In CS-B, DS VAF correlated more weakly to predicted values, with r 2 = 0.74 vs. vendor NGS quantification and r 2 = 0.57 vs. vendor ddPCR assessment. Of note, DS successfully identified FLT3 internal tandem duplications (ITD) from 33-300 bp, as well as other indels up to 23 bp. A custom ABL1 mutation mix was prepared by mixing ABL1-mutant DNA into control DNA from a healthy 18-year old donor. Nine mutations were spiked in at target frequencies from 1/1,000 to 1/30,000. The ABL1 mutation mix was Duplex Sequenced to a mean Duplex depth of 158,823x, and the control DNA was sequenced to a mean depth of 139,819x. All mutations were detected at close to expected frequencies (r 2 = 0.93.). Error bars represent 95% confidence intervals. Overall background mutation frequencies were 5.4x10 -7 in the mutation mix and the negative control. The negative control had zero alternate allele counts for any of the ABL1 spike-in mutations. CS-A and CS-B were sequenced to mean Duplex depth of 7,012x and 8,178x, respectively. Negative control DNA from a healthy 18-year old never-smoker was sequenced to mean Duplex depth of 24,751x. Background mutations were compared in CS-A, CS-B and the control. Despite being sequenced to approximately 3-fold greater Duplex depth, the control DNA had fewer low-frequency clonal mutations (mutations with < 1% VAF and ≥ 2 alternate allele counts) than CS-A or CS-B, in terms of mutant sites and proportion of total background mutations. The control also had lower overall background frequency (4.7x10 -7 ) than either CS-A or CS-B. Duplex Sequencing (DS) libraries from commercial standard A (CS-A) and commercial standard B (CS-B) were captured with probes targeting 29 genes recurrently mutated in acute myeloid leukemia (AML). CS-A is a mix of cell line DNA with known mutations, and CS-B is a mix of synthetic mutant DNA fragments spiked into DNA from a single cell line. Fifteen mutations at predicted variant allele frequencies (VAF) from 5-40% were captured in CS-A, and 10 mutations at predicted VAF of 5-15% in CS-B. All mutations were identified by DS. 0 10,000 20,000 30,000 40,000 50,000 CS-A CS-B Control Duplex Depth Mean Max 0.0E+00 5.0E-07 1.0E-06 1.5E-06 2.0E-06 2.5E-06 CS-A CS-B Control Overall Background Mutation Frequency 0 20 40 60 80 100 CS-A CS-B Control Number of Low Frequency Clonal Mutations 0% 10% 20% 30% CS-A CS-B Control % Clonal vs. Single-Count Mutations <1% VAF ABL1 Spike-In Mutation Target VAF Duplex VAF E255K 9.07E-04 7.34E-04 V299L 5.14E-04 4.86E-04 F317I 2.91E-04 2.73E-04 Y257C 2.76E-04 4.22E-04 F317L 1.41E-04 8.02E-05 ex08+78 4.73E-05 8.67E-05 F359V 4.72E-05 6.36E-05 E499E 3.85E-05 5.82E-05 T315I 3.38E-05 7.33E-05 0 50,000 100,000 150,000 200,000 ABL1 Mix Control Duplex Depth Mean Peak 0.E+00 2.E-07 4.E-07 6.E-07 8.E-07 1.E-06 ABL1 Mix Control Background Mutation Frequency Predicted VAF 1.E-05 1.E-04 1.E-03 1.E-02 1.E-05 1.E-04 1.E-03 1.E-02 Duplex VAF ABL1 Mutation Mix 0% 5% 10% 15% 20% 0% 5% 10% 15% 20% 25% Duplex VAF Standard NGS VAF (Vendor) r 2 = 0.74 CS-B 0% 10% 20% 30% 40% 0% 10% 20% 30% 40% 50% Duplex VAF r 2 = 0.98 CS-A ddPCR VAF (Vendor) 0% 10% 20% 30% 40% 50% VAF CS-A Target ddPCR (Vendor) Duplex Seq 0% 5% 10% 15% 20% 25% VAF CS-B Target ddPCR (Vendor) Standard NGS (Vendor) Duplex Seq Duplex Sequencing of Commercially Available DNA Mutation Standards Duplex Sequencing of Custom In-House ABL1 Mutation Mix Background Mutations in Commercial Standards vs. Negative Control Duplex Sequencing Correlation with ddPCR and Standard NGS 0% 5% 10% 15% 20% 0% 5% 10% 15% Duplex VAF r 2 = 0.57 ddPCR VAF (Vendor) CS-B 0.00% 0.02% 0.04% 0.06% 0.08 % 0.10% VAF No Error Correction 0.00% 0.02% 0.04% 0.06% 0.08 % 0.10% VAF Single Strand Consensus Sequencing (SSCS) Gene Mutation Predicted VAF TP53 Q136fs 1x10 -2 TP53 R249S 2x10 -3 TP53 R248Q 1x10 -3 NRAS Q61R 2x10 -4 TP53 G245S 1x10 -4 NRAS Q61L 4x10 -5 TP53 R175H 2x10 -5 KRAS G12D 1x10 -5 KRAS G12V 1x10 -5 DNA from 9 cell lines, each harboring a unique mutation in KRAS, NRAS and/or TP53, was mixed into negative control DNA such that expected frequencies in the DNA mix ranged from 1/100 to 1/100,000. Replicate libraries were Duplex Sequenced, and the data were combined for over 1 million total Duplex depth. Sequencing data were analyzed with no error correction, with single strand error correction only, or with Duplex error correction. Plots below represent alternate allele signal at each position in the 584 bp target region. 0.00% 0.02% 0.04% 0.06% 0.08% 0.10% VAF Duplex Error Correction TP53 R249S 1:500 TP53 R248Q 1:1,000 0.000% 0.005% 0.010% 0.015% 0.000% 0.001% 0.002% 0.003% 0.0000% 0.0005% 0.0010% KRAS G12V 1:100,000 NRAS Q61L 1:25,000 KRAS G12D 1:100,000 NRAS Q61R 1:5,000 TP53 G245S 1:10,000 TP53 Q136fs 1:100 TP53 R175H 1:50,000 Base Position in 584 bp Target Region All spiked-in mutations were identified above background signal in the Duplex data, down to VAFs of 0.001%. Mutations below approximately 0.1% are obscured in the SSCS data, and all mutations are obscured in the uncorrected data. Every position in the uncorrected or SSCS data has background signal, whereas the vast majority of Duplex base positions have zero background counts. The overall background frequency of non-spiked in variants with Duplex Sequencing is 5.8x10 -7 , which reflects a biological, rather than technical, background. For Research Use Only. Not for use in diagnostic procedures. ©2020 TwinStrand Biosciences, Inc. All rights reserved. All trademarks are the property of TwinStrand Biosciences, Inc. or their respective owners.

Transcript of MKT0006-2020-02 Redefining “Gold Standard” Ultra-Sensitive ......(ITD) from 33-300 bp, as well...

Page 1: MKT0006-2020-02 Redefining “Gold Standard” Ultra-Sensitive ......(ITD) from 33-300 bp, as well as other indels up to 23 bp. A custom ABL1 mutation mix was prepared by mixing ABL1-mutant

DuplexSeq™ Tag

N ́N ́N ́N ́N ́N ́N ́N´ N N N N N N N N

T

1

21

2

α B

AT

AAA

A

TTT

TT

C

T

CC

C

C

A

C

G

T

G

A

A

CT

AT

Source DNA molecule(Truth)

Top and bottom strands are amplifiedand sequenced. PCR copies are grouped

by unique tag and strand.

Compare top andbottom strands

Duplex consensuseliminates errors

C

TwinStrand Duplex Sequencing™ Technology

Sequencing Errors Obscure Truth

Next-GenerationSequencing (NGS)

Single StrandError-Corrected NGS

Duplex Sequencing

XX

Abstract

Redefining “Gold Standard”: Ultra-Sensitive Characterization of Commerical DNA Standards with Duplex Sequencing™

Jacob Higgins, PhD1 – Gabriel Pratt, PhD1 – Charles C. Valentine, MS1 – Lindsey N Williams, PhD1 – and Jesse J. Salk, MD, PhD1,2

TwinStrand Biosciences, Seattle, WA1 – Division of Medical Oncology, University of Washington, Seattle, WA2

Conclusions• Duplex Sequencing of cancer driver genes in apheresis DNA from a healthy 18-year

old control reveals a background mutation frequency below 1-in-2-million, which is biological, not technical.

• Cell lines have a higher background mutation frequency and a greater proportion of clonal (multi-count) background mutations than DNA from a healthy young donor, even when sequencing the donor to 3x greater depth.

• For mutation mixes, use of DNA from a young donor as the base results in a lower background mutation than with cell line DNA as the base dilutant.

• Duplex Sequencing readily detects single-nucleotide variants (SNVs) as well as large insertions and deletions at ultra low frequencies.

• Given sufficient sequencing depth, Duplex Sequencing detects mutations at frequencies from 1/30,000 to 1/100,000 with high sensitivity and specificity.

DNA standards are an essential resource in diagnostic and research labs. However, as genomic technologies become more sensitive, the metrics defining what a “gold standard” entails have not been carefully re-evaluated. Cell culture exposes DNA to oxidative damage and clonal selection, and synthetic DNA has a mutation frequency inherently higher than biologically-derived genomic DNA.

Here we apply ultra-accurate Duplex Sequencing to characterize the background mutation frequency in two commercial mutation standards, in control genomic DNA from a healthy young donor and in custom mutation mixes using that young donor DNA as a base. We observe that synthetic (highest) and cell line (intermediate) DNA-based mixes exhibit significantly elevated background mutation frequency relative to carefully selected genomic DNA or mutation mixes where genomic DNA is the dilutant into which patient or cell-line derived mutant molecules are spiked.

When employing high-sensitivity genomic technologies with the ability to resolve mutations at levels below one-in-one-million, rigorous attention to the techniques and substrates used for manufacturing mutation standards are critical.

Contact: https://twinstrandbio.com/contact/

Duplex Sequencing of Custom In-House AML Mutation Mix

MKT0006 v1.0

A DuplexSeq™ Adapter has:1.

2.

Identical (or relatable) degenerate tags in each strand.

An asymmetry allowing independent strand identification.

In CS-A, DS VAF was highly correlated to vendor ddPCR measurements, with r2 = 0.98. In CS-B, DS VAF correlated more weakly to predicted values, with r2 = 0.74 vs. vendor NGS quantification and r2 = 0.57 vs. vendor ddPCR assessment. Of note, DS successfully identified FLT3 internal tandem duplications (ITD) from 33-300 bp, as well as other indels up to 23 bp.

A custom ABL1 mutation mix was prepared by mixing ABL1-mutant DNA into control DNA from a healthy 18-year old donor. Nine mutations were spiked in at target frequencies from 1/1,000 to 1/30,000. The ABL1 mutation mix was Duplex Sequenced to a mean Duplex depth of 158,823x, and the control DNA was sequenced to a mean depth of 139,819x. All mutations were detected at close to expected frequencies (r2 = 0.93.). Error bars represent 95% confidence intervals. Overall background mutation frequencies were 5.4x10-7 in the mutation mix and the negative control. The negative control had zero alternate allele counts for any of the ABL1 spike-in mutations.

CS-A and CS-B were sequenced to mean Duplex depth of 7,012x and 8,178x, respectively. Negative control DNA from a healthy 18-year old never-smoker was sequenced to mean Duplex depth of 24,751x. Background mutations were compared in CS-A, CS-B and the control. Despite being sequenced to approximately 3-fold greater Duplex depth, the control DNA had fewer low-frequency clonal mutations (mutations with < 1% VAF and ≥ 2 alternate allele counts) than CS-A or CS-B, in terms of mutant sites and proportion of total background mutations. The control also had lower overall background frequency (4.7x10-7) than either CS-A or CS-B.

Duplex Sequencing (DS) libraries from commercial standard A (CS-A) and commercial standard B (CS-B) were captured with probes targeting 29 genes recurrently mutated in acute myeloid leukemia (AML). CS-A is a mix of cell line DNA with known mutations, and CS-B is a mix of synthetic mutant DNA fragments spiked into DNA from a single cell line. Fifteen mutations at predicted variant allele frequencies (VAF) from 5-40% were captured in CS-A, and 10 mutations at predicted VAF of 5-15% in CS-B. All mutations were identified by DS.

010,00020,00030,00040,00050,000

CS-A CS-B Control

Duplex Depth

MeanMax

0.0E+005.0E-071.0E-061.5E-062.0E-062.5E-06

CS-A CS-B Control

Overall Background Mutation Frequency

020406080

100

CS-A CS-B Control

Number of Low Frequency Clonal Mutations

0%

10%

20%

30%

CS-A CS-B Control

% Clonal vs. Single-Count Mutations <1% VAF

ABL1 Spike-In Mutation Target VAF Duplex VAFE255K 9.07E-04 7.34E-04V299L 5.14E-04 4.86E-04F317I 2.91E-04 2.73E-04Y257C 2.76E-04 4.22E-04F317L 1.41E-04 8.02E-05

ex08+78 4.73E-05 8.67E-05F359V 4.72E-05 6.36E-05E499E 3.85E-05 5.82E-05T315I 3.38E-05 7.33E-050

50,000

100,000

150,000

200,000

ABL1 Mix Control

Duplex Depth

Mean Peak

0.E+00

2.E-07

4.E-07

6.E-07

8.E-07

1.E-06

ABL1 Mix Control

Background Mutation Frequency

Predicted VAF

1.E-05

1.E-04

1.E-03

1.E-02

1.E-05 1.E-04 1.E-03 1.E-02

Dup

lex

VAF

ABL1 Mutation Mix

0%

5%

10%

15%

20%

0% 5% 10% 15% 20% 25%

Dup

lex

VAF

Standard NGS VAF (Vendor)

r2 = 0.74

CS-B

0%

10%

20%

30%

40%

0% 10% 20% 30% 40% 50%

Dup

lex

VAF

r2 = 0.98

CS-A

ddPCR VAF (Vendor)

0%10%20%30%40%50%

VAF

CS-A

TargetddPCR (Vendor)Duplex Seq

0%

5%

10%

15%

20%

25%

VAF

CS-BTargetddPCR (Vendor)Standard NGS (Vendor)Duplex Seq

Duplex Sequencing of Commercially Available DNA Mutation Standards

Duplex Sequencing of Custom In-House ABL1 Mutation Mix

Background Mutations in Commercial Standards vs. Negative Control

Duplex Sequencing Correlation with ddPCR and Standard NGS

0%

5%

10%

15%

20%

0% 5% 10% 15%

Dup

lex

VAF

r2 = 0.57

ddPCR VAF (Vendor)

CS-B

0.00%0.02%0.04%0.06%0.08%0.10%

VAF

No Error Correction

0.00%0.02%0.04%0.06%0.08%0.10%

VAF

Single Strand Consensus Sequencing (SSCS)

Gene Mutation Predicted VAF

TP53 Q136fs 1x10-2

TP53 R249S 2x10-3

TP53 R248Q 1x10-3

NRAS Q61R 2x10-4

TP53 G245S 1x10-4

NRAS Q61L 4x10-5

TP53 R175H 2x10-5

KRAS G12D 1x10-5

KRAS G12V 1x10-5

DNA from 9 cell lines, each harboring a unique mutation in KRAS, NRAS and/or TP53, was mixed into negative control DNA such that expected frequencies in the DNA mix ranged from 1/100 to 1/100,000. Replicate libraries were Duplex Sequenced, and the data were combined for over 1 million total Duplex depth. Sequencing data were analyzed with no error correction, with single strand error correction only, or with Duplex error correction. Plots below represent alternate allele signal at each position in the 584 bp target region.

0.00%0.02%0.04%0.06%0.08%0.10%

VAF

Duplex Error CorrectionTP53 R249S

1:500 TP53 R248Q1:1,000

0.000%0.005%0.010%0.015%

0.000%0.001%0.002%0.003%

0.0000%

0.0005%

0.0010%KRAS G12V1:100,000

NRAS Q61L1:25,000 KRAS G12D

1:100,000

NRAS Q61R1:5,000

TP53 G245S1:10,000

TP53 Q136fs1:100

TP53 R175H1:50,000

Base Position in 584 bp Target Region

All spiked-in mutations were identified above background signal in the Duplex data, down to VAFs of 0.001%. Mutations below approximately 0.1% are obscured in the SSCS data, and all mutations are obscured in the uncorrected data. Every position in the uncorrected or SSCS data has background signal, whereas the vast majority of Duplex base positions have zero background counts. The overall background frequency of non-spiked in variants with Duplex Sequencing is 5.8x10-7, which reflects a biological, rather than technical, background.

For Research Use Only. Not for use in diagnostic procedures. ©2020 TwinStrand Biosciences, Inc. All rights reserved. All trademarks are the property of TwinStrand Biosciences, Inc. or their respective owners.