RAMI-NGS, Hamburg, Germany, 9-11 June 2016 · 2016-06-21 · Read mapping algorithms Bowtie2 BWA...
Transcript of RAMI-NGS, Hamburg, Germany, 9-11 June 2016 · 2016-06-21 · Read mapping algorithms Bowtie2 BWA...
João André Carriço, Mario Ramirez Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of Lisbon [email protected] twitter: @jacarrico
RAMI-NGS, Hamburg, Germany, 9-11 June 2016
Moving from Typing into High Throughput Sequencing (HTS) Genomics : Increase in discrimination Extra information to be extracted the
genome (resistance profiles, virulence factors, genome organization)
Global Outbreak detection / Surveillance
Direct application in public health Source attribution -> intervention
Image credits: 1) http://www.iissiidiology.net/en/publications/104-ayfaar-interpersonal-and-true-human-relationship-harmonization-mechanisms 2) http://blog.f1000research.com/2014/04/04/reproducibility-tweetchat-recap/
Data Integration
Harmonization Reproducibility
1)
Algorithms
Interfaces
Ontologies
Read mapping algorithms Bowtie2
BWA
SOAP2
Saruman
mr/mrsFAST
…. (And a lot more )
Algorithms
Hatem M et all BMC Bioinformatics 2013..14:184 DOI: 10.1186/1471-2105-14-184
+ a plethora of parameters for each of them + a (proper) choice of reference
Gene-by-gene approach allele call algorithms: BIGSdb ( Jolley, K. A. & Maiden, M. C. J. BMC Bioinf 11, 595 (2010).)
Enterobase (https://enterobase.warwick.ac.uk/)
GEP (Genome Profiler) (JCM. 2015 May;53(5):1765-7)
Ridom Seqsphere Bionumerics (Applied Maths)
Mostly assembly based (yes it is a lot of work … )
Assembly algorithms have some parameters (mostly k-mer sizes)
Lots of heuristics for allele definition..
Algorithms
Gene by gene approaches:
What is a locus?
What is an allele?
It depends on the algorithm(s) used!
Algorithms
However the results are largely congruent!
Ontologies
Image from http://www.emiliosanfilippo.it/?page_id=1172
“Formal representation of knowledge as a set of concepts within a domain, and the relationships between those concepts” – Wikipedia
Domain modeling: represents all the concepts involved in in microbial typing by sequence-based methods
Provides a shared vocabulary, where the concepts should be unambiguous
Enables a machine-readable format that can be used for software and algorithms automatically interact with multiple databases
Ontologies
Ontologies
GenEpiO: Combining Different Epi, Lab,
Genomics and Clinical Data Fields.
Lab Analytics Genomics, PFGE
Serotyping, Phage typing MLST, AMR
Clinical Data Patient demographics,
Medical History, Comorbidities, Symptoms,
Health Status
Reporting Case/Investigation Status
GenEpiO (Genomic Epidemiology Application Ontology)
See draft version at https://github.com/Public-Health-Bioinformatics/IRIDA_ontology
Original slide from Emma Griffiths
Ontologies
Public Health Surveillance
Case Cluster Analysis
Result Reporting
Infectious Disease Epidemiology (from case to Intervention)
Lab Surveillance (from sample to strain typing results)
Evidence Collection
& Outbreak Investigation
Sample Collection & Processing
Sequence Data Generation &
Processing
Bioinformatics Analysis
Result Reporting
Whole Genome Sequencing (SO, ERO, OBI etc)
Quality Control (OBI, ERO)
Anatomy (FMA)
Environment (Envo)
Food (FoodOn)
Clinical Sampling (OBI)
Custom LIMS
Quality Control (OBI, ERO)
AMR (ARO)
Virulence (PATO)
Phylogenetic Clustering (EDAM)
Mobile Elements (MobiO)
Quality Control (OBI, ERO)
AMR (ARO) LOINC
Surveillance (SurvO)
Demographics (SIO)
Patient History (SIO)
Symptoms (SYMP)
Exposures (ExO)
Source Attribution (IDO)
Travel (IDO)
Transmission (TRANS)
Food (FoodOn)
Geography (OMRSE)
Outbreak Protocols
Surveillance (SurvO)
Food (FoodOn)
Surveillance (SurvO)
Mobile Elements (MobiO)
Infectious Disease (IDO)
Typing (TypON)
Nomenclature & Taxonomy (NCBItaxon)
Original slide from Emma Griffiths /IRIDA
http://foodontology.github.io/foodon/
(pipeline) NGSOnto
Provides machine-readable web-based interface,i.e.,the algorithms (not humans) can:
retrieve, submit , update data /analysis results
launch analysis/algorithms
Interfaces
http://www.clker.com/cliparts/q/P/V/D/5/R/cog-allgrey-hi.png
BIGSdb Enterobase Offer an Restful API for data retrieving, submission and data analysis
Interfaces
Interfaces
Interfaces
https://online.phyloviz.net/
API: *account creation *profile + metadata upload *running goeBURST *retrieving a link Private or Public data sharing Scalable to thousands of nodes Tree Analysis tools:
Interactive distance matrix NLV graph
Transparency of analytical methods
Better definition of concepts
(Clinical/Lab/Analysis)
Better tool/database interoperability
• Reproducibility of results • Creation of modular analysis with added value • Custom interfaces for non-bionf specialists
Actionable
Results
UMMI Members
Bruno Gonçalves Mickael Silva Miguel MAchado Mário Ramirez José Melo-Cristino
INESC-ID Alexandre Francisco Cátia Vaz Marta Nascimento
EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/)
Mirko Rossi
FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):
Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)
Genome Canada IRIDA project (www.irida.ca)
Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC)
INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS