SNAP: Fast, accurate sequence alignment enabling biological applications

SNAP: Fast, accurate sequence alignment enabling biological

applicationsRavi Pandya, Microsoft Research

ASHG 10/19/2014

SNAP is fast *Align 50x genome in 1.2 hours(BWA-MEM = 11.75 hours)Sort + index + markdup BAM in 2 hours(samtools+sambamba = 4.25 hours)

SNAP is as accurate as BWA-MEM, Bowtie2, etc.ROC on simulated data% aligned on real dataVariant calls on real data

* NA12878:ERR194147, Azure D14 (16 cores, 112GB RAM, 800GB SSD)

Sequence alignment

The problem:Given a read R and a reference genome GFind the position in p in G that minimizesEditDistance(R, G[p .. p + |R|])

SNAP solves this quickly and accurately because of:Efficient system architectureReducing the number of comparisonsReducing the cost of comparisons

System architecturefull

align sort

async read async write

emptytemp file

mergesort

markduplicates

compress

The sequence alignment problemThe easy part:

97% of 20-mersin the human genomeoccur only oncebut at only 75% of locations

The hard part:

The other 3% of 20-mersand 25% of locations

Single Equally Weighted

Paired Equally Weighted

Single Time Weighted

Paired Time Weighted

10% of reads

95% of time

CDF of per-read/pair alignment time, NA18705 169M pairs(using deeper search parameters than current defaults)

Bill Bolosky, MSR

Hash table lookup

Build a multi-valued map (~30GB for hg19)from all seeds S in G all locations of S in G

330 reads/s

14k reads/s

For all seeds in read, all locations of seed in genome,Score implied alignment of read, keep the best

Ignore frequent seeds (>300 occurrences)Only use a few seeds/read

Bill Bolosky, MSR

Fast scoring

113k reads/s

154k reads/s(470x overall)

Sort candidates by # of seed hits

Skip locations with #seed misses > limit

92k reads/s O(n2) Ukkonen O(nd), n=len, d=min(limit, actual)Use limit = best score so far + 2 (for MAPQ)

Bill Bolosky, MSR

Paired-end alignment

Find & score candidate location pairsC(R1:R2) = C(R1) ∩ C(R2) {± insert size}Enumerate in O(h log n) h = |C(R1) ∩ C(R2)| n = |C(R1)| + |C(R2)|Increases accuracy by allowingmuch higher limit on seed occurrences(e.g. 4k vs 300)

Bill Bolosky, MSR

Results: simulated data

Mason-generated paired-end 100bp reads

Results: real data

NA18507 (Illumina HiSeq 50x)

* AWS cr1.8xlarge (32 cores, 244GB RAM, 2x120GB SSD)

Results: GATK variant calls

Broad GATK pipeline, curated NA12878 variant calls

Results: NIST Genome-in-a-BottleAppistry GATK pipeline, GIAB highly confident callsLonger seeds are much faster, similar precision/recall

ERR194147*.fastq.gz, Azure D14 (16 cores, 112GB RAM, 800GB SSD)

Results: NIST Genome-in-a-BottleLower confidence calls (qual>20, 2 platforms)

Highly confident indel snp Aligner Recall Precision Recall Precisionbwa-mem 97.24% 97.15% 99.57% 99.65%snap-20 97.04% 97.48% 99.51% 99.57%snap-24 97.04% 97.46% 99.52% 99.57%snap-28 97.04% 97.45% 99.53% 99.57%snap-32 97.00% 97.41% 99.51% 99.57%

Lower confidence indel snp Aligner Recall Precision Recall Precisionbwa-mem 96.38% 96.30% 99.00% 99.32%snap-20 96.17% 96.68% 98.94% 99.25%snap-24 96.17% 96.67% 98.95% 99.23%snap-28 96.16% 96.62% 98.96% 99.21%snap-32 96.11% 96.55% 98.94% 99.17%

Pathogen ID: SURPI (Charles Chiu, UCSF)

“This analysis of DNA sequences required just 96 minutes. A similar analysis conducted with the use of previous generations of computational software on the same hardware platform would have taken 24 hours or more to complete, Chiu said.”

SNAP enables SURPI with:Fast filtering mode64-bit index for >40GB ntDBSecondary mapping output

Charles Chiu, UCSF

Acknowledgements

Microsoft ResearchBill BoloskyRavi PandyaUC San FranciscoTaylor SittlerBroad InstituteChristopher Hartl

UC Berkeley AMPLabMatei ZahariaKristal CurtisArmando FoxScott ShenkerIon StoicaDavid Patterson

Binaries, source, documentation (Apache 2.0 licensed)http://snap.cs.berkeley.edu

SNAP: Fast, accurate sequence alignment enabling biological applications

Documents

Transcript of SNAP: Fast, accurate sequence alignment enabling biological applications

Achieving More Accurate Prediction Polymer Snap 2011 F

Achieving a more Accurate Prediction of a Polymer Snap ...€¦ · Achieving a more Accurate Prediction of a Polymer Snap Deformation Pattern Torben Strøm Hansen, Bo Uldall Kristiansen

THE BACK PAGE The Flyer - barninghamvillage.co.uk fileTyres, wheel alignment, batteries, clutches, MoT preparation. Latest ‘Snap-On’ vehicle diagnostics. Motorcycle, quad bike

How accurate is anatomic limb alignment in predicting ...

NEW LASER ALIGNMENT FOR SHAFT AL30 · AL30 THE SOLUTION FOR SHAFT ALIGNMENT The new alignment system is a product created to carry out accurate alignment measurements. The measure

TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Achieving a more Accurate Prediction of a Polymer Snap Deformation Pattern

Salmon: Accurate, Versatile and Ultrafast Quantification ... · 6/27/2015 · Salmon: Accurate, Versatile and Ultrafast Quanti cation from RNA-seq Data using Lightweight-Alignment

AXIS4000 HEAVY-DUTY ALIGNMENT SYSTEM - … PLACEMENT AXIS4000 Portable imaging alignment technology offers live readings and fast, accurate results HEAVY-DUTY ALIGNMENT SYSTEM

An economical means for accurate azimuthal alignment of ... · Keywords: alignment, polarization, optic, polarizer, azimuthal, kinematic, swings, waveplate 1. INTRODUCTION 1.1.Polarized

Towards Fast, Accurate and Stable 3D Dense Face Alignment...Towards Fast, Accurate and Stable 3D Dense Face Alignment 3 publicly available for 3D dense face alignment. To address it,

Why Alignment - L&S Electric · PDF fileOut of tolerance 2. Within tolerance Accurate shaft alignment contributes in more than one way towards great savings and a cleaner environment.

SNAP: Fast, accurate sequence alignment enabling biological applications Ravi Pandya, Microsoft Research ASHG 10/19/2014.

CERNCOURIER - CERN Document Server · wire and directions with a gyro-theodolite. This underground alignment was accurate to a very satisfactory 2.4 mm compared with the surface alignment.

Accurate and Automatic Alignment of Range Surfaces

Accurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon Chenhui Chu, Toshiaki Nakazawa, Sadao.

SNAP: Automated Generation of Quantum Accurate Potentials for Large-Scale Atomistic Materials Simulation Aidan Thompson, Stephen Foiles, Peter Schultz,

Computer Vision and Image Understandinglegacydirs.umiacs.umd.edu/~jhchoi/paper/cviu2012_full.pdf · face recognition approaches require accurate alignment and fea-ture correspondence

A Domain Decomposition Strategy for Alignment of Multiple ...ion, and glued together to get a highly accurate alignment of multiple sequences. Our approach is capable of aligning a

An Autocollimator is for measurement of angular tilts whereas the Alignment Telescope is for establishing an accurate line of sight. Alignment Autocollimator is a combination of Alignment