Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
-
Upload
harriet-lambert -
Category
Documents
-
view
220 -
download
1
Transcript of Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Considerations for Analyzing Targeted NGS Data
BRCA
Tim Hague,CTO
Omixon WorkshopsConsiderations for Analyzing Targeted NGS Data - BRCATim Hague, CEO
BRCA 1 and 2 are best known as 'cancer susceptibility' genes
Actually the proteins repair damage in DNA
Large number of known deleterious mutations
Disproportionate number of indels
Introduction
Mary-Claire King discovered BRCA1 and BRCA2, published the function
Myriad Genetics won the patent
History
Indel size (nt)
Distribution of Known BRCA1 Deletions >3 bp
Dominuque StoppaLyonnet at Curie Institute
„Large scale deletions could account for as many as one-third of all BRCA1 mutations in some populations”
BRCA are tumor suppressor genes. 82% lifetime chance of developing breast/ovarian cancer
Science 2004, 306:2187-2191
>1,500 deleterious BRCA mutations
17 kbp coding region with mutation rate of: 1/2000
NGS-based BRCA screening Leeds UK, Newgene UK, Ghent Belgium
DIY genetic test published by Salzberg
82% chance of cancer
>90% chance of being false positive/ negative
False negatives must be avoided Precision of both sequencing data and the
data analysis is key Looking for indels – indel detection
abilities are a key criterion Repeats are also an issue in BRCA region
What kind of NGS data?
BRCA Repeats
Homopolymer errors look like small indels and can cause noise
Problem for: Roche 454
Ion Torrent
Homopolymer Errors
Read length is a limiting factor for insertion detection
When searching for indels, long reads can help. Long reads can also help with repeats
Roche 454 have the longest reads
Long Reads
Real Examples With Roche 454 Data
Real Examples With Roche 454 Data
Paired reads can also help to increase effective 'read length'
Illumina MiSeq now has 2x250bp protocol
Paired Reads
Compare 9 open source and commercial NGS analysis softwares
In silico test with mutated reference BRCA gene
2211 known BRCA variants
1341 SNOs, 320 insertions and 551 deletions
Full GATK pipeline used for variant call, including quality recalibration and indel realignment
Overall Sensitivity:
99.2% Paired End
94.5% Single End
SNPs Found:
99.5% PE
99.5% SE
Deletions Found:
98.5% PE
85.5% SE
Insertions Found:
99.4% PE
89.4% SE
BWA
False Negatives :
17 Paired End
121 Single End
False Positives:
23 PE
168 SE
The longest (60bp+) deletions were not found, either with PE or SE data
BWA
Indel Sizes – BWA Single End
Indel Sizes – BWA Paired End
Most other alignment tools showed a similar trend – much better results overall with Paired data
Only two of the tools tested found the longest deletions, even with Paired data
Other Tools
Much better for reliable variant detection than equivalent length single reads
Provided much better coverage in the BRCA region (spanning small repeats)
If available, paired reads should be preferred
Paired Reads - Conclusions
Not all tools are good at finding indels
Burrows Wheeler based aligners can't find indels beyond a few base pairs in single reads, but can make better use of paired data – if indel realignment is also used
They still can't detect the longest indels (there is just a gap in coverage)
If indel detection is required, an indel sensitive tool should be used
Indel Detection - Conclusions
None of the alignment tools found all the variants It will almost certainly require the same data to be
analyzed with more than one tool, to get sufficiently accurate results
Overall - Conclusions
Download our Omixon Target™ Evaluation Version
Today
OMIXON.COM