Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these...

32
Omixon Workshops Considerations for Analyzing Targeted NGS Data - Introduction Tim Hague, CEO

Transcript of Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these...

Page 1: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Omixon WorkshopsConsiderations for Analyzing Targeted NGS Data - IntroductionTim Hague, CEO

Page 2: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Targeted Data

Page 3: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Introduction

Many mapping, alignment and variant calling algorithms

Most of these have been developed for whole genome sequencing and to some extent population genetic studies

Page 4: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Premise

In contrast, NGS based diagnostics deals with particular genes or mutations of an individual

Different diagnostic targets present specific challenges

Page 5: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Goal

Present analysis issues related to differences in:

Sequencing technologies

Targeting technologies

Target specifics

Pseudogenes and segmental duplication

Page 6: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Roche 454Illumina IonTorrentt

NGS Sequencers

Illumina

Ion Torrent

Roche 454

(SOLiD)

Page 7: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Mind The Gap

Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.

Page 8: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Sequencing Technology

Differences: Homopolymer error rates G/C content errors Read length Sequencing protocols (single vs paired reads)

Page 9: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Targeting Methods

PCR primers (e.g. amplicons) Hybridization probes (e.g. exome kits)

Page 10: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Targeting Technology

Differences: Exact matching regions vs regions with SNPs

Results in: Need for mapping against whole chromosomes to

avoid false positives

Page 11: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Analysis Targets

Differences:

Rate of polymorphism

Repetitive structures

Mutation profiles

G/C content

Single genes vs multi gene complexes

Page 12: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

BRCA1/2 HLA CFTR1/2000 1/29 1/2000

Distributions of insertions and deletions

Distribution of repeat elements

Page 13: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Page 14: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Segmental Duplications

Sometimes called Low Copy Repeats (LCRs)

Highly homologous, >95% sequence identity

Rare in most mammals

Comprise a large portion of the human genome (and other primate genomes)

Important for understanding HLA

Page 15: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Many LCRs are concentrated in "hotspots„

Recombinations in these regions are responsible for a wide range of disorders, including: – Charcot-Marie-Tooth syndrome type 1A– Hereditary neuropathy with liability to pressure palsies– Smith-Magenis syndrome– Potocki-Lupski syndrome

Segmental Duplications

Page 16: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Data analysis shouldn’t be like this!

Data Analysis Tools

Differences: Detection rates of complex variants (sensitivity) False positive rates (accuracy) Speed Ease of use

Page 17: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

“Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools—nearly as big as the two Life Tech/Illumina genomes.”

Mark Yandel in BioIT-World.com, June 8, 2011

Page 18: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Examples

Missing variants

SNPs, a DNP and deletions

Page 19: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Page 20: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Identify More Valid Variants

Page 21: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Find Homopolymer Indels

Page 22: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Examples

Coverage differences

Page 23: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

[0-432]

[0-96]

Four Times Exon Coverage

Page 24: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

[0-24]

[0-10]

Higher Exome Coverage

Page 25: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

First Conclusion

Read accuracy is not the limiting factor in accurate

variant analysis

Page 26: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Example - Dense Region of SNPs

Page 27: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Second Conclusion

As variant density increases the performance of most tools

goes down

Page 28: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Variant Calling

There are few popular variant callers: GATK, SAMtools mpileup, VarScan

The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step

These recalibration and realignment steps are highly recommended to be run before any variant call

Deduplication and removing non-primary alignments may also be required

Page 29: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Indel Realigner Problem

Page 30: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Variants That Can be Hard to Find

DNPs TNPs Small indels next to SNPs 30+ bp indels Homopolymer indels Homopolymer indel and SNP together Indels in palindromes Dense regions of variants

Page 31: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Contact

Tim Hague, CEO

Omixon Biocomputing Solutions

[email protected]

+36 70 318 4878

Page 32: Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

Download our Omixon Target™ Evaluation Version

Today

OMIXON.COM