Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and...

32
Bioinformatics tools for viral quasispecies reconstruction from next- generation sequencing data and vaccine optimization PD: Ion Măndoiu, UConn Co-PDs: Mazhar Khan, UConn Rachel O’Neill, UConn Alex Zelikovsky, GSU

Transcript of Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and...

Page 1: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and

vaccine optimization

PD: Ion Măndoiu, UConnCo-PDs: Mazhar Khan, UConn

Rachel O’Neill, UConnAlex Zelikovsky, GSU

Page 2: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 3: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Infectious Bronchitis Virus (IBV)• Group 3 coronavirus• Biggest single cause of

economic loss in US poultry farms−Young chickens: coughing, tracheal

rales, dyspnea−Broiler chickens: reduced growth rate−Layers: egg production drops 5-50%,

thin-shelled, watery albumin

• Worldwide distribution, with dozens of serotypes in circulation‒ Co-infection with multiple serotypes

is not uncommon, creating conditions for recombination IBV-infected

embryonormalembryo

IBV-infected egg defects

Page 4: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

IBV Vaccination Broadly used, most commonly with attenuated live vaccine• Short lived protection• Layers need to be re-vaccinated multiple times

during their lifespan• Vaccines might undergo selection in vivo and

regain virulence [Hilt, Jackwood, and McKinley 2008]

Page 5: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

RNA Virus Replication

High mutation rate (~10-4)

Lauring & Andino, PLoS Pathogens 2011

Page 6: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Quasispecies identified by cloning and Sanger sequencing in both IBV infected poultry and commercial vaccines [Jackwood, Hilt, and Callison 2003; Hilt, Jackwood, and McKinley 2008]

Evolution of IBV

Page 7: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

How Are Quasispecies Contributing to Virus Persistence and Evolution?

• Variants differ in– Virulence– Ability to escape immune response– Resistance to antiviral therapies– Tissue tropism

Lauring & Andino, PLoS Pathogens 2011

Page 8: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Project Aims

• Develop bioinformatics tools for accurate reconstruction of quasispecies sequences and their frequencies from next-generation reads

• Study quasispecies persistence and evolution of IBV in commercial layer flocks following vaccination

• Use results of this study to optimize vaccine development and vaccination protocols

Page 9: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 10: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Next Generation Sequencing

10

http://www.economist.com/node/16349358

Roche/454 FLX Titanium400-600 million reads/run

Length up to 1,000 bp

Illumina HiSeq 2000up to 6 billion PE reads/run

35-100bp read length

SOLiD 4/55001.4-2.4 billion PE reads/run

35-50bp read length

Ion Torrent PGM1-10M reads/run

length up to 400bp

Page 11: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

• Shotgun reads—starting positions

distributed ~uniformly

• Amplicon reads— reads have

predefined start/end positionscovering fixed overlappingwindows

Shotgun vs. Amplicon Reads

Page 12: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Reconstruction from Shotgun Reads: ViSpA

Read Error Correction

Read Alignment

Preprocessing of Aligned

Reads

Read Graph ConstructionContig Assembly

Frequency Estimation

Shotgun reads

Quasispecies sequences w/ frequencies

User Specified Parameters: (A) Number of mismatches (B) Mutation rate

Page 13: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Reconstruction from Amplicon Reads: VirA

Reference in FASTAformat

Error-correctedSAM/BAMRead data

Estimate Amplicons

Max-Bandwidth Paths

Viral population variants with frequencies

Amplicon Read Graph

Frequency Estimation

Page 14: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Amplicon Sequencing Challenges

• Multiple reads from consecutive amplicons may match over their overlap

• Distinct quasispecies may be indistinguishable in an amplicon interval

Page 15: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 16: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

IBV Genome

Rev. Bras. Cienc. Avic. vol.12 no.2 Campinas Apr./June 2010

RT-PCR of S1 using redesigned primers

Page 17: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Experiment 110 clone pool

C1 20%C2 20%C3 15%C4 15%C5 10%C6 10%C7 4%C8 4%C9 1%C10 1%

Assembled quasispecies

PV1 PV2PV3

…PVk

454 reads

M42 Sample

454 reads

53 plasmid clones

V1 V2V3

…Vn

Assembled quasispecies

Page 18: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Evaluated Reconstruction Flows

Page 19: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Reads Statistics & Coverage

Sample

Number of Reads

Uncorrected SAET Corrected Shorah Corrected KEC Corrected

M42 isolate 53062 53062 50858 48945

M42 clone pool 21040 21040 19439 17122

Page 20: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Reads Validation

Page 21: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

How well we predicted sanger

clones

How well our prediction is

Page 22: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Average Prediction Error

Page 23: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Neighbor-Joining Tree for M42 Sanger Clones & Vispa Qsps

Page 24: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Experiment 2

Page 25: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Reads Statistics & CoverageSample

Number of Reads

Uncorrected SAET corrected Shorah corrected KEC corrected

M41 Vaccine 92113 92113 87883 85311

Field #1 38502 38502 33685 32521

Field #2 132513 132513 123370 111686

Field #3 76906 76906 71408 64507

Field #4 44467 44467 41653 37295

Page 26: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Neighbor-Joining Tree for Sanger clones and ViSpA Reconstructed Sequences

Page 27: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum

reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work

Page 28: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Summary

• Developed software tools for quasispecies reconstruction from both shotgun and amplicon next-generation reads‒ Code and executables freely available at

http://alla.cs.gsu.edu/~software/VISPA/vispa.html http://alan.cs.gsu.edu/vira/

– ViSpA plugin developed for users of ION Torrent, available on ION community

• Experimental results on both simulated and real data show improved accuracy tradeoffs compared to previous methods

• Tools are applicable to quasispecies studies of other viruses

Page 29: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Ongoing Work

• Deployment of ViSpA and VirA on Galaxy servers maintained at UConn and GSU

• Tool validation on ION Torrent reads

• Comparison of shotgun and amplicon based reconstruction methods

• Combining long and short read technologies

• Quasispecies persistence studies using longitudinal sampling

Page 30: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Tool Validation for ION Torrent reads

• Shotgun IBV reads generated using 316 ION chip

– 2,384,007 reads (1,177,740 after SAET correction)

– mean length 203.58 bp• ViSpA results

– 23 quasispecies with estimated frequency > .5%, 2,200 total

Page 31: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Longitudinal Sampling

Amplicon / shotgun

sequencing

Page 32: Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and vaccine optimization PD: Ion M ă ndoiu, UConn Co-PDs:Mazhar.

Contributors

University of Connecticut:Rachel O’Neal, PhD. Mazhar Kahn, Ph.D.

Hongjun Wang, Ph.D. Craig ObergfellAndrew Bligh

Bassam TorkEkaterina Nenastyeva

Alex ArtyomenkoSerghei Mangul

Nicholas MancusoAlexander Zelikovsky

University of MarylandIrina Astrovskaya, Ph.D.