Computational Resources In Infectious Disease

33
Computational Resources in Infectious Disease João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbon [email protected] twitter: @jacarrico ME081 – Meet-The-Expert Session 26 th ECCMID, Amsterdam, Netherlands 7-12 April 2016

Transcript of Computational Resources In Infectious Disease

Page 1: Computational Resources In Infectious Disease

Computational Resources in Infectious Disease

João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico

ME081 – Meet-The-Expert Session26th ECCMID, Amsterdam, Netherlands 7-12 April 2016

Page 2: Computational Resources In Infectious Disease

Disclaimer This presentation is not intended to cover all available

software or databases (we would need several weeks or months to do that)

I’ll present what I use or intend to use in a near future

I gladly accept any suggestions to included on similar presentations in the future.

It is supposed to be interactive so ask away during the presentation.

Page 3: Computational Resources In Infectious Disease

Summary Available Databases

Virulence Factors and AMR DBs Sequence-based typing databases: Pubmlst.org / Enterobase

High Throughput Sequencing data analysis (freeware) Prokka Roary Nullabor Microreact.org PHYLOViZ

Commercial Solutions Bionumerics 7.5 CLC Genomics Workbench (CLC Bio) Ridom Seqsphere+

Page 4: Computational Resources In Infectious Disease

VF DatabasesVirulence Factor Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm) Pathosystems Resource Integration Center (PATRIC)

VF (https)://www.patricbrc.org/) Victors (http://www.phidias.us/victors/) PHI-Base (http://www.phi-base.org/) MvirDB (http://mvirdb.llnl.gov/ )

To know more: - Presentation on the Controversies in interpreting whole genome sequence data session : http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases

Page 5: Computational Resources In Infectious Disease

Antibiotic Resistance Databases Comprehensive Antibiotic Resistance Database

(CARD) (https://card.mcmaster.ca/)

Repository of Antibiotic resistance Cassetes (RAC) (http://rac.aihi.mq.edu.au/rac/)

Integrall :The integron database (http://integrall.bio.ua.pt/)

(…)

Page 6: Computational Resources In Infectious Disease

Sequenced my strain…now what?

To know more : http://www.slideshare.net/nickloman/eccmid-2015-so-i-have-sequenced-my-genome-what-now

Reads(fastq files)

contigs(fasta files)

Annotated contigs(gbk/gff files)

Roary :Pan Genome Analysis

Enterobase BIGSdb

Nullabor

PHYLOViZ:Tree + metada visualization

Microreact.org: Tree +metadata +vizualization

Prok

ka

De novo assembler

Page 7: Computational Resources In Infectious Disease

Sequence Based Typing :Pubmlst /BIGSdb

http://www.pubmlst.org

http://bigsdb.web.pasteur.fr/

Page 8: Computational Resources In Infectious Disease

Sequence Based Typing :Enterobase

slide by @happy_khan

Martin SergeantMark AchtmanNabil-Fareed AlikhanZhemin Zhou

Page 9: Computational Resources In Infectious Disease

Prokka Genome annotation made easy by

Torsten Seemann (slides by Torsten) Genome annotation: adding

biological information to the sequence, by describing features

To know more :http://www.slideshare.net/torstenseemann/prokka-rapid-bacterial-genome-annotation-abphm-2013

Available at: https://github.com/tseemann/prokka

Page 10: Computational Resources In Infectious Disease

Roary Pan genome analysis by Andrew Page Available at: https://sangerpathogens.github.io/Roary/

Core genome

Accessory genome

Pan-genome

Page 11: Computational Resources In Infectious Disease

Roary Inputs: Annotated de novo assemblies (GFF files)

• Typically from the annotation pipeline

Outputs:• Spreadsheet with presence and absence of genes• Multi-FASTA alignment of core genes so you can build a tree

without a reference• Multi-FASTA alignments for each gene• Plots for the open/closed genome, unique genes• Integrates with iCANDY so you can visualise all structural variation• QC report from Kraken to help identify suspect samples

(Slide by Andrew Page)

Page 12: Computational Resources In Infectious Disease

Roary outputs

Core (n or n-1 strains)

Soft-Core (n-2 or n-3 strains)

Shell( 8(?) to n-3 strains)

Cloud( <8 (?) strains)

Core genome:Core + Soft-Core

Accessory genome:Shell + Cloud

Page 13: Computational Resources In Infectious Disease

Roary outputs

iCANDY output of presence and absence of genes in accessory genome.S. Weltevreden & public S. enterica genomes

(Slide by Andrew Page)

Page 14: Computational Resources In Infectious Disease

Nullarbor Complete pipeline from reads to reports by Torsten

Seemann

Objective is automate analysis for everyday use on public health labs /research settings

Uses and distills outputs by a lot of software

Avaliable at: https://github.com/tseemann/nullarbor

Page 15: Computational Resources In Infectious Disease

Nullarbor

Slide by Torsten Seeman

Page 16: Computational Resources In Infectious Disease

Nullarbor

From: https://github.com/tseemann/nullarbor

Page 17: Computational Resources In Infectious Disease

Some Nullarbor outputs in report

Slides by Torsten Seeman

Page 18: Computational Resources In Infectious Disease

PHYLOViZwww.phyloviz.net

Page 19: Computational Resources In Infectious Disease

PHYLOViZInputs:- Tab separated txt

(profiles)- Fasta files- Automatic database

retrieval (MLST) Outputs:• goeBURST and

goeBURST MST• Link quality assessment• High quality images

Can be easily applied to:- MLST/ cgMLST/wgMLST- MLVA- SNP data*- Gene Presence/absence

Page 20: Computational Resources In Infectious Disease

PHYLOViZ 2.0

New features: • Hierarchical clustering • Neighbor-Joining• Project Saving

Page 21: Computational Resources In Infectious Disease

PHYLOViZ Online Available at http://online.phyloviz.net

Web based version of PHYLOViZ

Allows users to create their own datasets, save them and share their data (privately or publicly)

REST API available

Scalable to thousands of nodes

Tree Analysis tools: Interactive distance matrix NLV graph

Page 22: Computational Resources In Infectious Disease

PHYLOViZ Online

Slide by @happy_khan

Page 23: Computational Resources In Infectious Disease

PHYLOViZ Online

Page 24: Computational Resources In Infectious Disease

PHYLOViZ Online

NLV Graph

Tree cut-off

Full MST

Page 25: Computational Resources In Infectious Disease

microreact.org

Page 26: Computational Resources In Infectious Disease

microreact.org

Page 27: Computational Resources In Infectious Disease

microreact.org

Create Selections

Change tree options

Page 28: Computational Resources In Infectious Disease

microreact.org Available at http://microreact.org/

Presentation on session Harnessing whole genome sequence data for public health applications : Novel open access tools for WGS-based pathogen surveillance and the identification of high-risk clones

http://eccmidlive.org/#resources/novel-open-access-tools-for-wgs-based-pathogen-surveillance-and-the-identification-of-high-risk-clones

Page 29: Computational Resources In Infectious Disease

Meet The Experts (available on twitter by order of appearance)

Page 30: Computational Resources In Infectious Disease

Commercial solutions

• Ridom Seqsphere+ : http://www.ridom.de/seqsphere/ • Applied Maths Bionumerics 7.6: http://www.applied-maths.com/bionumerics• CLCBio Genomic Workbench : http://www.clcbio.com/blog/clc-genomics-workbench-7-5/

Page 31: Computational Resources In Infectious Disease

Take home messages• Huge variety of software and database

solutions

• There is no single One-Size-Fits-All solution (job security for bioinformaticians)

• Different questions require different approaches

• Always questions the results and data provenance

Page 32: Computational Resources In Infectious Disease

ECCMID2015 Meet-the-expert session on “What bioinformatic tools should I use for analysis of High Throughput Sequencing data for molecular diagnostics? ”

Nick Loman: http://www.slideshare.net/nickloman/eccmid-2015-meettheexpert-bioinformatics-tools

João André Carriço: http://www.slideshare.net/joaoandrecarrico/eccmid-meet-theexpert2015

More references/presentations

Page 33: Computational Resources In Infectious Disease

Acknowledgments UMMI Members

Bruno Gonçalves Mário Ramirez José Melo-Cristino

INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento

EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi

FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)

Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS