Pathogen Profiling Pipeline

23
June 27, 2009 Pathogen Profiling Pipeline M3 SIG – ISMB/ECCB 2009 1 Pathogen Profiling Pipeline Tom Matthews National Microbiology Laboratory Public Health Agency of Canada [email protected] A Metagenomics Tool for Rapid Identification of Pathogens from Clinical Specimens

description

Metagenomc sample analysis pipeline.Talk at M3 SIG at ISMB 2009 in Stockholm Sweden.

Transcript of Pathogen Profiling Pipeline

Page 1: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

1

Pathogen Profiling Pipeline

Tom MatthewsNational Microbiology LaboratoryPublic Health Agency of Canada

[email protected]

A Metagenomics Tool for RapidIdentification of Pathogens from Clinical

Specimens

Page 2: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

2

Introduction

● With novel/emerging disease classical pathogen identification may not always produce results

● Advances in next-gen sequencing technology● Characterize samples at genomic level

● Pathogen Profiling Pipeline● Bioinformatics pipeline ● Analysis of host and microbial nucleic acids

Page 3: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

3

Features

● Nucleotide and protein BLAST analysis● Unbiased analysis of input reads● Clustered execution● Web front-end● Custom analysis pipelines● Easily viewed results

Page 4: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

4

Filtering Overview● BLAST analysis performed against reference

sequence database● Assigns hits according to cut-off criteria● Calculate equivalent hits● Clustered BLAST and filtering

Page 5: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

5

Last Common Ancestor Estimation

● Uses equivalent hits for LCA calculation

● User specifies equivalent hit percentage cutoff

● NCBI taxonomy database for ancestor lookup

● Walks up taxonomy tree to find lowest intersection of all leaf nodes

● Unbiased approach

Vaccinia

Camelpox

Taterapox

VariolaOrthopoxvirus

Page 6: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

6

Filtering Outputs

● Hits – High scoring reads passing filtering values

● Equivalent Hits – BLAST hits matching to within an assigned percentage of the top hit's bitscore

● Last Common Ancestors – Calculated (estimated) LCA of all the equivalent hits

● Unassigned – Passed to the next pipeline step

Page 7: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

7

Example Analysis Method

● BLAST reads against host database

● Remove host reads

● BLAST unassigned against reference database

● Filter hits vs. unassigned

● Repeat...

● Post analysis

Samplereads

BLAST andFiltering

Hostgenome

Viralgenome

Bacterialgenome

Protozoangenome

Fungalgenome

Non-hostreads

BLAST andFiltering

BLAST andFiltering

Non-hostreads

BLAST andFiltering

BLAST andFiltering

Poolresults

UniqueorganismsIn sample

Page 8: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

8

Pipeline Construction

Page 9: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

9

Pipeline Construction

Page 10: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

10

Pipeline Construction

Page 11: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

11

Pipeline Construction

Page 12: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

12

Pipeline Construction

Page 13: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

13

Pipeline Execution

● Custom execution manager● Computes dependencies and monitors running

jobs● Distribute jobs across Linux cluster● Facilitates unattended clustered executions

Page 14: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

14

Reports

Page 15: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

15

Drill Down Reports

Page 16: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

16

Abundance View

● Displays abundance of taxonomic hits

Page 17: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

17

Example Run

● Mouth swab input samples● Two pools:

● Samples spiked with Vaccinia and Influenza A● Background reference sample

Page 18: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

18

Example Run

Page 19: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

19

Example run

Page 20: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

20

Example Run

Page 21: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

21

Example Run

Page 22: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

22

Wrap-up

● Unbiased analysis of input reads● Custom analysis pipelines● Last common ancestor calculation● Clustered execution● Multiple report views● Exportable results

Page 23: Pathogen Profiling Pipeline

June 27, 2009 Pathogen Profiling PipelineM3 SIG – ISMB/ECCB 2009

23

Acknowledgements

● Gary Van Domselaar● Morag Graham● Shaun Tyler● Heather Kent● Kim Melnychuk● Christine Bonner● Geoff Peters● Philip Mabon